Polypeptide encoded by a nucleotide sequence of a nontypeable strain of Haemophilus influenzae genome

ABSTRACT

The invention relates to the polynucleotide sequence of a nontypeable stain of  Haemophilus influenzae  (NTHi) and polypeptides encoded by the polynucleotides and uses thereof. The invention also relates to NTHi genes which are upregulated during or in response to NTHi infection of the middle ear and/or the nasopharynx.

RELATED APPLICATIONS

The present application claims priority benefit from U.S. ProvisionalApplication 60/453,134 filed Mar. 6, 2003 which is incorporated hereinby reference in its entirety.

Scientific work relating to the invention was supported by Grant No.DC03915 from the United States National Institute of Health. The UnitedStates government may have certain rights in the invention.

The file copy of the sequence listing is submitted on a Compact-DiscRead Only Memory (CD-ROM). The sequence listing is saved as an ASCIItext file named 38815A.txt (5140 KB), which was created on Mar. 5, 2004.The contents of the CD-ROM are incorporated herein by reference in itsentirety.

FIELD OF THE INVENTION

The invention relates to the polynucleotide sequence of a nontypeablestrain of Haemophilus influenzae (NTHi) genome, NTHi genes containedwithin the genome and polypeptides encoded by the polynucleotides. Theinvention also relates to uses of these NTHi polynucleotides and NTHipolypeptides including vaccines and methods of treating and preventingNTHi related disorders. The invention also relates to NTHi genes whichare upregulated during or in response to NTHi infection of the middleear or nasopharynx.

BACKGROUND

Otitis media (OM) is a highly prevalent pediatric disease worldwide andis the primary cause for emergency room visits by children(Infante-Rivand and Fernandez, Epidemiol. Rev., 15: 444-465, 1993).Recent statistic indicate that 24.5 million physician office visits weremade for OM in 1990, representing a greater than 200% increase overthose reported in the 1980's. While rarely associated with mortality anylonger, the morbidity associated with OM is significant. Hearing loss isa common problem associated with this disease, often times affecting achild's behavior, education and development of language skills (Baldwin,Am. J. Otol., 14: 601-604, 1993; Hunter et al., Ann. Otol. Rhinol.Laryngol. Suppl., 163: 59-61, 1994; Teele et al., J. Infect. Dis., 162:685-694, 1990). The socioeconomic impact of OM is also great, withdirect and indirect costs of diagnosing and managing OM exceeding $5billion annually in the U.S. alone (Kaplan et al., Pediatr. Infect. Dis.J., 16: S9-11, 1997).

Whereas antibiotic therapy is common and the surgical placement oftympanostomy tubes has been successful in terms of draining effusions,clearing infection and relieving pain associated with the accumulationof fluids in the middle ear, the emergence of multipleantibiotic-resistant bacteria and the invasive nature associated withtube placement, has illuminated the need for more effective and acceptedapproaches to the management and preferably, the prevention of OM.Surgical management of chronic OM involves the insertion of tympanostomytubes through the tympanic membrane while a child is under generalanesthesia. While this procedure is commonplace (prevalence rates are˜13%; Bright et al., Am. J. Public Health, 83(7): 1026-8, 1993) and ishighly effective in terms of relieving painful symptoms by draining themiddle ear of accumulated fluids, it too has met with criticism due tothe invasive nature of the procedure and its incumbent risks (Berman etal., Pediatrics, 93(3):353-63, 1994; Bright et al., supra.; Cimons, ASMNews, 60: 527-528; Paap, Ann. Pharmacother., 30(11): 1291-7, 1996).

Progress in vaccine development is most advanced for Streptococcuspneumoniae, the primary causative agent of acute OM (AOM), as evidencedby the recent approval and release of a seven-valent capsular-conjugatevaccine, PREVNAR® (Eskola and Kilpi, Pedriatr. Infect. Dis. J. 16:S72-78, 2000). While PREVNAR® has been highly efficacious for invasivepneumococcal disease, coverage for OM has been disappointing (6-8%) withreports of an increased number of OM cases due to serotypes not includedin the vaccine (Black et al., Pedriatr. Infect. Dis J ., 19: 187-195;Eskola et al., Pedriatr. Infect. Dis J, 19: S72-78, 2000; Eskola et al.,N. Engl. J. Med. 344: 403-409, 2001; Snow et al., Otol. Neurotol., 23:1-2, 2002). Less progress has been made for non-typeable Haemophilusinfluenzae (NTHi), the gram-negative pathogen that predominates inchronic OM with effusion (Klein, Pedriatr. Infect. Dis J., 16: S5-8,1997; Spinola et al., J. Infect. Dis., 154: 100-109, 1986). Hamperingdevelopment of effective vaccines against NTHi, is the currentlyincomplete understanding of the pathogenesis of NTHi-induced middle eardisease. Contributing to this delay is a lack of understanding of thedynamic interplay between microbe-expressed virulence factors and thehost's immune response as the disease progresses from one of hostimmunological tolerance of a benign nasopharyngeal commensal, to that ofan active defensive reaction to an opportunistic invader of the normallysterile middle ear space.

Currently there is a poor understanding of how NTHi causes OM inchildren. The identification of putative virulence factors necessary forinduction of OM will contribute significantly to the understanding ofthe host-pathogen interaction and ultimately, the identification ofpotential vaccine candidates and targets of chemotherapy. There is atremendous need to develop more effective and accepted approaches to themanagement and preferably, the prevention of otitis media. Vaccinedevelopment is a very promising and cost effective method to accomplishthis goal (Giebank, Pedriatr. Infect. Dis J., 13(11): 1064-8, 1994:Karma et al., Int. J. Pedritr. Otorhinolaryngol., 32(Suppl.): S127-34,1995).

SUMMARY OF INVENTION

The present invention provides for the identification andcharacterization of the genomic sequence of NTHi H. influenzae strain86-028NP and the polypeptide sequences encoded thereby. The 3-foldanalysis of the NTHi genomic sequence is set out in a series of contigsequences denoted as SEQ ID NO: 1-576, and the subsequent 8-foldanalysis of the genomic sequence is set out in a series of 11 contigsequences denoted as SEQ ID NOS: 675-685. These contigs are raw data andone of skill in the art may assemble these contigs by comparingoverlapping sequences to construct the complete genome of the NTHi stain86-028NP using routine methods.

The present invention also provides for antibodies specific for the NTHipolypeptides of the invention. Methods of detecting NTHi bacteria in ahuman or in sample, such as serum, sputum, ear fluid, blood, urine,lymphatic fluid and cerebrospinal fluid are contemplated. These methodsinclude detecting NTHi polynucleotides with specific polynucleotideprobes or detecting NTHi polypeptides with specific antibodies. Theinvention also contemplates diagnostic kits which utilize these methodsof detecting NTHi bacteria.

The present invention also contemplates methods of eliciting an immuneresponse by administering a NTHi polypeptide of the invention or a NTHipeptide thereof. These methods include administering the NTHipolypeptide or NTHi peptide as a vaccine for treatment and/or preventionof diseases caused by NTHi infection, such as OM. The following NTHigenes are upregulated during or in response to middle ear and/ornasopharynx infections; and the polypeptides encoded by these genes andpeptides thereof are contemplates as possible OM vaccine candidatesand/or target of chemotherapy: hisB, lppB, sapA, lolA, rbsC, purE, ribB,arcB, uxuA, dsbB, ureH, licC, HI1647, ispz, radC, mukF, glpR, ihfB,argR, cspD, HI0094, HI1163, HI1063, HI0665, HI1292, HI1064. NTHi hisBgene is set out as nucleotide sequence SEQ ID NO: 615 and encodes theamino acid sequence set out as SEQ ID NO: 616. NTHi sapA gene is set outas nucleotide sequence SEQ ID NO: 617 and encodes the amino acidsequence set out as SEQ ID NO: 618. NTHi rbsC gene is set out asnucleotide sequence SEQ ID NO: 619 and encodes the amino acid sequenceset out as SEQ ID NO: 620. NTHi purE gene is set out as nucleotidesequence SEQ ID NO: 621 and encodes the amino acid sequence set out asSEQ ID NO: 622. NTHi ribB gene is set out as nucleotide sequence SEQ IDNO: 623 and encodes the amino acid sequence set out as SEQ ID NO: 624.NTHi arcB gene is set out as nucleotide sequence SEQ ID NO: 625 andencodes the amino acid sequence set out as SEQ ID NO: 626. NTHi uxuAgene is set out as nucleotide sequence SEQ ID NO: 627 and encodes theamino acid sequence set out as SEQ ID NO: 628. NTHi dsbB gene is set outas nucleotide sequence SEQ ID NO: 629 and encodes the amino acidsequence set out as SEQ ID NO: 630. NTHi ureh gene is set out asnucleotide sequence SEQ ID NO: 631 and encodes the amino acid sequenceset out as SEQ ID NO: 632. NTHi licC gene is set out as nucleotidesequence SEQ ID NO: 633 and encodes the amino acid sequence set out asSEQ ID NO: 634. NTHi HI1647 gene is set out as nucleotide sequence SEQID NO: 635 and encodes the amino acid sequence set out as SEQ ID NO:636. NTHi ispZ gene is set out as nucleotide sequence SEQ ID NO: 637 andencodes the amino acid sequence set out as SEQ ID NO: 638. NTHi radCgene is set out as nucleotide sequence SEQ ID NO: 639 and encodes theamino acid sequence set out as SEQ ID NO: 640. NTHi mukF gene is set outas nucleotide sequence SEQ ID NO: 641 and encodes the amino acidsequence set out as SEQ ID NO: 642. NTHi glpR gene is set out asnucleotide sequence SEQ ID NO: 643 and encodes the amino acid sequenceset out as SEQ ID NO: 644. NTHi ihfB gene is set out as nucleotidesequence SEQ ID NO: 645 and encodes the amino acid sequence set out asSEQ ID NO: 646. NTHi argR gene is set out as nucleotide sequence SEQ IDNO: 647 and encodes the amino acid sequence set out as SEQ ID NO: 648.NTHi cspD gene is set out as nucleotide sequence SEQ ID NO: 649 andencodes the amino acid sequence set out as SEQ ID NO: 650. NTHi HI1163gene is set out as nucleotide sequence SEQ ID NO: 651 and encodes theamino acid sequence set out as SEQ ID NO: 652. NTHi HI1063 gene is setout as nucleotide sequence SEQ ID NO: 653 and encodes the amino acidsequence set out as SEQ ID NO: 654. NTHi HI0665 gene is set out asnucleotide sequence SEQ ID NO: 655 and encodes the amino acid sequenceset out as SEQ ID NO: 656. NTHi HI1292 gene is set out as nucleotidesequence SEQ ID NO: 657 and encodes the amino acid sequence set out asSEQ ID NO: 658.

The novel NTHi genes included in the polynucleotide sequences presentedas SEQ ID NOS: 1-576, SEQ ID NOS: 675-685 and the nucleotide sequencesset out in Tables 4 and 4B are also up-regulated during infection of themiddle ear and/or the nasopharynx, and therefore are contemplated toencode OM vaccine candidates and/or targets of chemotherapy. Inaddition, the following NTHi genes are contemplated to bevirulence-associated genes and therefore are contemplated to encodepossible OM vaccine candidates and/or targets of chemotherapy: HI1386,HI1462, HI1369, lav, HI1598. NTHi HI1386 gene sequence is set out as SEQID NO: 659 and encodes the amino acid sequence set out as SEQ ID NO:660. NTHi HI1462 gene sequence is set out as SEQ ID NO: 661 and encodesthe amino acid sequence set out as SEQ ID NO: 662. NTHi HI1369 genesequence is set out as SEQ ID NO: 665 and encodes the amino acidsequence set out as SEQ ID NO: 666. NTHi lav gene sequence is set out asSEQ ID NO: 663 and encodes the amino acid sequence set out as SEQ ID NO:664. NTHi HI1598 gene sequence is set out as SEQ ID NO: 669 and SEQ IDNO: 671and encodes the amino acid sequence set out as SEQ ID NO: 670 andSEQ ID NO: 672. Additional NTHi genes associated with virulence includethe polynucleotide sequences presented as SEQ ID NO: 667 and SEQ ID NO:673.

As a method of treating or preventing NTHi infection, the presentinvention contemplates administering a molecule that inhibits expressionor the activity of the NTHi polypeptides, which are upregulated oractive during infection. In particular, the invention contemplatesmethods of treating or preventing NTHi infection comprising modulatingNTHi protein expression by administering an antisense oligonucleotidethat specifically binds to NTHi genes that are upregulated during NTHiinfections, such genes include hisB, IppB, sapa, lola, rbsC, purE, ribB,arcB, uxuA, dsbB, ureh, licC, HI1647, ispz, radC, mukF, glpR, ihjB,argR, cspD, HI0094, HI1163, HI1063, HI0665, HI1292, HI1064. Theinvention also contemplates methods of treating or preventing NTHiinfection comprising administering antibodies or small molecules thatmodulate the activity of the proteins encoded by theses genes. The novelNTHi genes included in the polynucleotide sequences presented as SEQ IDNOS: 1-576, SEQ ID NOS: 675-685 and the nucleotide sequences set out inTables 4 and 4B are also up-regulated during infection of the middle earand/or the nasopharynx and therefore antisense oligonucleotides thatspecifically bind these polynucleotide sequences are also contemplated.

Polynucleotides and Polypeptides of the Invention

The present invention provides for the sequences of the NTHi strain86-028NP genome. This genomic sequence is presented as a series ofcontig sequences denoted herein as “contigs 1-576”. Each contig isassigned a sequence identification number that correlates with its“contig number”. Therefore, the contigs of the present invention as setout as SEQ ID NOS: 1-576. These contig polynucleotide sequences may beassembled into the complete genome sequence of the NTHi strain 86-028NPusing routine methods. Upon completion of 8-fold sequence analysis ofthe NTHi strain 82-028NP genome, the genomic sequence was assembled into11 contigs which are denoted herein as SEQ ID NOS: 675-685.

The present invention provides for the NTHi polynucleotide sequences andopen reading frames contained within the contigs of SEQ ID NOS: 1-576,SEQ ID NOS: 675-685 and the nucleotide sequences set out in Table 3B,Table 4B and Table 5. The present invention also provides for thepolypeptide sequences encoded by the NTHi polynucleotides of the presentinvention such as the amino acid sequences set out in Table 3B, Table 4Band Table 5. The invention provides for polynucleotides that hybridizeunder stringent conditions to (a) the complement of the nucleotidessequence of SEQ ID NOS: 1-576; SEQ ID NOS: 675-685 and the nucleotidesequences set out in Table 3B, Table 4B and Table 5 herein (b) apolynucleotide which is an allelic variant of any polynucleotidesrecited above; (c) a polynucleotide which encodes a species homolog ofany of the proteins recited above; or (d) a polynucleotide that encodesa polypeptide comprising a specific domain or truncation of the NTHipolypeptides of the present invention.

The NTHi polynucleotides of the invention also include nucleotidesequences that are substantially equivalent to the polynucleotidesrecited above. Polynucleotides according to the invention can have,e.g., at least 65%, at least 70%, at least 75%, at least 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, or 89%, more typically at least 90%, 91%,92%, 93%, or 94% and even more typically at least 95%, 96%, 97%, 98% or99% sequence identity to the NTHi polynucleotides recited above.

Included within the scope of the nucleic acid sequences of the inventionare nucleic acid sequence fragments that hybridize under stringentconditions to the NTHi nucleotide sequences of SEQ ID NOS: 1-576, SEQ IDNOS: 675-685 and the nucleotide sequences set out in Table 3B, Table 4Band Table 5 herein, or compliments thereof, which fragment is greaterthan about 5 nucleotides, preferably 7 nucleotides, more preferablygreater than 9 nucleotides and most preferably greater than 17nucleotides. Fragments of, e.g., 15, 17, or 20 nucleotides or more thatare selective for (i.e., specifically hybridize to any one of thepolynucleotides of the invention) are contemplated. Probes capable ofspecifically hybridizing to a polynucleotide can differentiate NTHipolynucleotide sequences of the invention from other polynucleotidesequences in the same family of genes or can differentiate NTHi genesfrom other bacterial genes, and are preferably based on uniquenucleotide sequences.

The term “stringent” is used to refer to conditions that are commonlyunderstood in the art as stringent. Hybridization stringency isprincipally determined by temperature, ionic strength, and theconcentration of denaturing agents such as formamide. Examples ofstringent conditions for hybridization and washing are 0.015 M sodiumchloride, 0.0015M sodium citrate at 65-68° C. or 0.015 M sodiumchloride, 0.0015M sodium citrate, and 50% formamide at 42° C. SeeSambrook et al., Molecular Cloning: A Laboratory Manual, 2^(nd) Ed.,Cold Spring Harbor Laboratory, (Cold Spring Harbor, N.Y. 1989). Morestringent conditions (such as higher temperature, lower ionic strength,higher formamide, or other denaturing agent) may also be used, however,the rate of hybridization will be affected. In instances whereinhybridization of deoxyoligonucleotides is concerned, additionalexemplary stringent hybridization conditions include washing in 6×SSC0.05% sodium pyrophosphate at 37° C. (for 14-base oligos), 48° C. (for17-base oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-baseoligos).

Other agents may be included in the hybridization and washing buffersfor the purpose of reducing non-specific and/or backgroundhybridization. Examples are 0.1% bovine serum albumin, 0.1%polyvinyl-pyrrolidone, 0.1% sodium pyrophosphate, 0.1% sodiumdodecylsulfate, NaDodSO₄, (SDS), ficoll, Denhardt's solution, sonicatedsalmon sperm DNA (or other non-complementary DNA), and dextran sulfate,although other suitable agents can also be used. The concentration andtypes of these additives can be changed without substantially affectingthe stringency of the hybridization conditions. Hybridizationexperiments are usually carried out at pH 6.8-7.4, however, at typicalionic strength conditions, the rate of hybridization is nearlyindependent of pH. See Anderson et al., Nucleic Acid Hybridisation: APractical Approach, Ch. 4, IRL Press Limited (Oxford, England).Hybridization conditions can be adjusted by one skilled in the art inorder to accommodate these variables and allow DNAs of differentsequence relatedness to form hybrids.

The sequences falling within the scope of the present invention are notlimited to these specific sequences, but also include allelic andspecies variations thereof. Allelic and species variations can beroutinely determined by comparing the sequence provided in SEQ ID NOS:1-576, SEQ ID NOS: 675-685, and nucleotide sequences out in Table 3B,Table 4B and Table 5 herein, preferably the open reading frames therein,a representative fragment thereof, or a nucleotide sequence at least 90%identical, preferably 95% identical, to the open reading frames withinSEQ ID NOS: 1-576, SEQ ID NOS: 675-685 and the nucleotide sequences setout in Table 3B, Table 4B and Table 5 with a sequence from anotherisolate of the same species. Preferred computer program methods todetermine identity and similarity between two sequences include, but arenot limited to, the GCG program package, including GAP (Devereux et al.,Nucl. Acid. Res., 12:.387,-1984; Genetics Computer Group, University ofWisconsin, Madison, Wis.), BLASTP, BLASTN, and FASTA (Altschul et al.,J. Mol. Biol., 215: 403-410, 1990). The BLASTX program is publiclyavailable from the National Center for Biotechnology Information (NCBI)and other sources (BLAST Manual, Altschul et al. NCB/NILM/NIH Bethesda,MD 20894; Altschul et al., supra). The well known Smith Watermanalgorithm may also be used to determine identity.

Furthermore, to accommodate codon variability, the invention includesnucleic acid molecules coding for the same amino acid sequences as dothe specific open reading frames (ORF) disclosed herein. In other words,in the coding region of an ORF, substitution of one codon for anothercodon that encodes the same amino acid is expressly contemplated.

The isolated polypeptides of the invention include, but are not limitedto, a polypeptide comprising: the amino acid sequences encoded by thenucleotide sequences included within-the polynucleotide sequences setout as SEQ ID NOS: 1-576, SEQ ID NOS: 675-685 and the nucleotidesequences set out in Table 3B, Table 4B and Table 5, or thecorresponding full length or mature protein. The polypeptides of theinvention include the amino acid sequences of SEQ ID NO: 616, SEQ ID NO:618, SEQ ID NO: 620, SEQ ID NO: 622, SEQ ID NO: 624, SEQ ID NO: 626, SEQID NO: 628, SEQ ID NO: 628, SEQ ID NO: 630, SEQ ID NO: 632, SEQ ID NO:634, SEQ ID NO: 636, SEQ ID NO: 638, SEQ ID NO: 640, SEQ ID NO: 642, SEQID NO: 644, SEQ ID NO: 646, SEQ ID NO: 648, SEQ ID NO: 650, SEQ ID NO:652, SEQ ID NO: 654, SEQ ID NO: 656, SEQ ID NO: 658, SEQ ID NO: 660, SEQID NO: 662, SEQ ID NO: 664, SEQ ID NO: 666, SEQ ID NO: .668, SEQ ID NO:670, SEQ ID NO: 672, SEQ ID NO: 674, SEQ ID NO: 687, SEQ ID NO: 689, SEQID NO: 691, SEQ ID NO: 693, SEQ ID NO: 695, SEQ ID NO: 697, SEQ ID NO:699, SEQ ID NO: 701., SEQ ID NO: 703, SEQ ID NO: 705, SEQ ID NO:. 707,SEQ ID NO: 709, SEQ ID NO: 711, SEQ ID NO: 713, SEQ ID NO:715, SEQ IDNO: 717, SEQ ID NO: 719, SEQ ID NO: 721, SEQ ID NO:723, SEQ ID NO:725,SEQ ID NO:727, SEQ ID NO:729, SEQ ID NO: 731, SEQ ID NO: 733, SEQ ID NO:735, SEQ ID NO: 737, SEQ ID NO: 739, SEQ ID NO: 741, SEQ ID NO: 743, SEQID NO: 745, SEQ ID NO: 747, SEQ ID NO: 749, SEQ ID NO: 751, SEQ ID NO:753: SEQ ID NO: 755, SEQ ID NO: 757, SEQ ID NO: 759, SEQ ID NO: 761,763, SEQ ID NO: 765, SEQ ID NO: 767, SEQ ID NO: 769 or SEQ ID NO: 771which are set out in Table 3B, Table 4B and Table 5 herein.

Polypeptides of the invention also include polypeptides preferably withbiological or immunogenic activity that are encoded by: (a) an openreading frame contained within the nucleotide sequences set forth as SEQID NOS: 1-576, SEQ ID NOS: 675-685 and the nucleotide sequences set outin Table 3B, Table 4B and Table 5, or (b) polynucleotides that hybridizeto the complement of the polynucleotides of (a) under stringenthybridization conditions.

The invention also provides biologically active or immunologicallyactive variants of the amino acid sequences of the present invention;and “substantial equivalents” thereof (e.g., with at least about 65%, atleast about 70%, at least about 75%, at least about 80%, at least about85%, 86%, 87%, 88%, 89%, at least about 90%, 91%, 92%, 93%, 94%,typically at least about 95%, 96%, 97%, more typically at least about98%, or most typically at least about 99% amino acid identity) thatretain biological and/or immunogenic activity. Polypeptides encoded byallelic variants may have a similar, increased, or decreased activitycompared to polypeptides encoded by the polynucleotides included withinthe nucleotide sequences presented in SEQ ID NOS: 1-576, SEQ ID NOS:675-685 and the nucleotide sequences set out in Table 3B, Table 4B andTable 5 herein, and the polypeptides having an amino acid sequence setout in Table 3B, Table 4B and Table 5 herein

NTHi peptides refer to fragments of the NTHi polypeptides encoded by thenucleotide sequences presented in SEQ ID NOS: 1-576, SEQ ID NOS: 675-685or the nucleotide sequences set out in Table 3B, Table 4B and Table 5herein, and the polypeptides having the amino acid sequences set out inTable 3B, Table 4B and Table 5 herein. The preferred NTHi peptides arebiologically and/or immuniologically active.

The present invention further provides isolated NTHi polypeptides orNTHi peptides encoded by the NTHi nucleic acid fragments of the presentinvention or by degenerate variants of the nucleic acid fragments of thepresent invention. The term “degenerate variant” refers to nucleotidefragments which differ from a nucleic acid fragment of the presentinvention (e.g., an ORF) by nucleotide sequence but, due to thedegeneracy of the genetic code, encode an identical NTHi polypeptidesequence. Preferred nucleic acid fragments of the present invention arethe ORFs that encode proteins.

The invention also provides for NTHi polypeptides with one or moreconservative amino acid substitutions that do not affect the biologicaland/or immunogenic activity of the polypeptide. Alternatively, the NTHipolypeptides of the invention are contemplated to have conservativeamino acids substitutions which may or may not alter biologicalactivity. The term “conservative amino acid substitution” refers to asubstitution of a native amino acid residue with a nonnative residue,including naturally occurring and nonnaturally occurring amino acids,such that there is little or no effect on the polarity or charge of theamino acid residue at that position. For example, a conservativesubstitution results from the replacement of a non-polar residue in apolypeptide with any other non-polar residue. Further, any nativeresidue in the polypeptide may also be substituted with alanine,according to the methods of “alanine scanning mutagenesis”. Naturallyoccurring amino acids are characterized based on their side chains asfollows: basic: arginine, lysine, histidine; acidic: glutamic acid,aspartic acid; uncharged polar: glutamine, asparagine, serine,threonine, tyrosine; and non-polar: phenylalanine, tryptophan, cysteine,glycine, alanine, valine, proline, methionine, leucine, norleucine,isoleucine General rules for amino acid substitutions are set forth inTable 1 below.

TABLE 1 Amino Acid Substitutions Original Residues ExemplarySubstitutions Preferred Substitutions Ala Val, Leu, Ile Val Arg Lys,Gln, Asn Lys Asn Gln Gln Asp Glu Glu Cys Ser, Ala Ser Gln Asn Asn GluAsp Asn Gly Pro, Ala Ala His Asn, Gln, Lys, Arg Arg Ile Leu, Val, Met,Ala, Phe, Leu Leu Norleucine, Ile, Val, Met, Leu Lys Arg, 1,4Diaminobutyric Arg Met Leu, Phe, Ile Leu Phe Leu, Val, Ile, Ala, Tyr ArgPro Ala Gly Ser Thr, Ala, Cys Thr Thr Ser Ser Trp Tyr, Phe Tyr Tyr Trp,Phe, Thr, Ser Phe Val Ile, Met, Leu, Phe, Ala, Leu

Antisense polynucleotides complementary to the polynucleotides encodingthe NTHi polypeptides are also provided.

The invention contemplates that polynucleotides of the invention may beinserted in a vector for amplification or expression. For expression,the polynucleotides are operatively linked to appropriate expressioncontrol sequence such as a promoter and polyadenylation signalsequences. Further provided are cells comprising polynucleotides of theinvention. Exemplary prokaryotic hosts include bacteria such as E. coli,Bacillus, Streptomyces, Pseudomoonas, Salmonella and Serratia.

The term “isolated” refers to a substance removed from, and essentiallyfree of, the other components of the environment in which it naturallyexists. For example, a polypeptide is separated from other cellularproteins or a DNA is separated from other DNA flanking it in a genome inwhich it naturally occurs.

Antibodies and Methods for Eliciting an Immune Response

The invention provides antibodies which bind to antigenic epitopesunique to (i.e., are specific for) NTHi polypeptides. Also provided areantibodies which bind to antigenic epitopes common among multiple H.influenzae subtypes but unique with respect to any other antigenicepitopes. The antibodies may be polyclonal antibodies, monoclonalantibodies, antibody fragments which retain their ability to bind theirunique epitope (e.g., Fv, Fab and F(ab)2 fragments), single chainantibodies and human or humanized antibodies. Antibodies may begenerated by techniques standard in the art.

It is known in the art that antibodies to the capsular polysaccharide ofH. influenzae exhibit the ability to kill bacteria in vitro assays.These antibodies are also known to protect against challenge with H.influenzae in animal model systems. These studies indicate antibody tothe capsular polysaccharrides are likely to elicit a protective immuneresponse in humans. The present invention provides for antibodiesspecific for the NTHi polypeptides of the present invention andfragments thereof, which exhibit the ability to kill both H. influenzaebacteria and to protect humans from NTHi infection. The presentinvention also provides for antibodies specific for the NTHipolypeptides of the invention which reduce the virulence, inhibitadherence, inhibit cell division, and/or inhibit penetration into theepithelium of H. influenzae bacteria or enhance phagocytosis of the H.influenzae bacteria.

In vitro complement mediated bactericidal assay systems (Musher et ai.,Infect. Immun. 39: 297-304, 1983; Anderson et al., J. Clin. Invest. 51:31-38, 1972) may be used to measure the bactericidal activity ofanti-NTHi antibodies. Further data on the ability of NTHi polypeptidesand NTHi peptides to elicit a protective antibody response may begenerated by using animal models of infection such as the chinchillamodel system described herein.

It is also possible to confer short-term protection to a host by passiveimmunotherapy via the administration of pre-formed antibody against anepitope of NTHi, such as antibodies against NTHi OMP, LOS or noncapsularproteins. Thus, the contemplated vaccine formulations can be used toproduce antibodies for use in passive immunotherapy. Humanimmunoglobulin is preferred in human medicine because a heterologousimmunoglobulin may provoke an immune response to its foreign immunogeniccomponents. Such passive immunization could be used on an emergencybasis for immediate protection of unimmunized individuals exposed tospecial risks. Alternatively, these antibodies can be used in theproduction of anti-idiotypic antibody, which in turn can be used as anantigen to stimulate an immune response against NTHi epitopes.

The invention contemplates methods of eliciting an immune response toNTHi in an individual. These methods include immune responses which killthe NTHi bacteria and immune responses which block H. influenzaeattachment to cells. In one embodiment, the methods comprise a step ofadministering an immunogenic-dose of a composition comprising a NTHiprotein or NTHi peptide of the invention. In another embodiment, themethods comprise administering an immunogenic dose of a compositioncomprising a cell expressing a NTHi protein or NTHi peptide of theinvention. In yet another embodiment, the methods comprise administeringan immunogenic dose of a composition comprising a polynucleotideencoding a NTHi protein or NTHi peptide of the invention. Thepolynucleotide may be a naked polynucleotide not associated with anyother nucleic acid or may be in a vector such as a plasmid or viralvector (e.g., adeno-associated virus vector or adenovirus vector).Administration of the compositions may be by routes standard in the art,for example, parenteral, intravenous, oral, buccal, nasal, pulmonary,rectal, or vaginal. The methods may be used in combination in a singleindividual. The methods may be used prior or subsequent to NTHiinfection of an individual.

An “immunological dose” is a dose which is adequate to produce antibodyand/or T cell immune response to protect said individual from NTHiinfection, particularly NTHi infection of the middle ear and/or thenasopharynx or lower airway. Also provided are methods whereby suchimmunological response slows bacterial replication. A further aspect ofthe invention relates to an immunological composition which, whenintroduced into an individual capable or having induced within it animmunological response. The immunological response may be usedtherapeutically or prophylactically and may take the form of antibodyimmunity or cellular immunity such as that arising from CTL or CD4+ Tcells. A NTHi protein or an antigenic peptide thereof may be fused withco-protein which may not by itself produce antibodies, but is capable ofstabilizing the first protein and producing a fused protein which willhave immunogenic and protective properties. Thus fused recombinantprotein, preferably further comprises an antigenic co-protein, such asGlutathione-S-transferase (GST) or beta-galactosidase, relatively largeco-proteins which solubilize the protein and facilitate production andpurification thereof. Moreover, the co-protein may act as an adjuvant inthe sense of providing a generalized stimulation of the immune system.The co-protein may be attached to either the amino or carboxy terminusof the first protein. Provided by this invention are compositions,particularly vaccine compositions, and methods comprising the NTHipolypeptides encoded by the polynucleotide of the invention or antigenicpeptides thereof.

The invention correspondingly provides compositions suitable foreliciting an immune response to NTHi infection, wherein the antibodieselicited block binding of NTHi bacterium to the host's cells. Thecompositions comprise NTHi proteins or NTHi peptides of the invention,cells expressing the NTHi polypeptide, or polynucleotides encoding thepolypeptides. The compositions may also comprise other ingredients suchas carriers and adjuvants.

Genes that are up-regulated in NTHi infection of the middle ear and/orthe nasopharynx and genes that are associated with NTHi virulence aredescribed herein. The polypeptides and peptides thereof which areencoded by these NTHi genes are contemplated to be useful for elicitingan immune response for treating or preventing disorders associated withNTHi infection, such as OM. Some of the polypeptides encoded by thesegenes include: histidine biosynthesis protein, lipoprotein B, peptideABC transporter, periplasmic SapA precursor, outer membrane lipoproteinscarrier protein precursor, ribose transport system permease protein,phosphoribosylaminoimidazole carboxylase catalytic subunit, PurE,Phosphoribosylaminoimidazole carboxylase catalytic subunit, ornithinecarbamolytransferase, mannonate dehydratase, disulfide oxidoreductase,urease accessory protein, phospshocholine cytidylytransferase, putativepyridoxine biosynthesis protein, singlet oxygen resistance protein,intracellular septation protein, DNA repair protein, MukF protein,glycerol-3-phosphate regulon repressor, integration host factor betasubunit, arginine repressor, cold shock like protein, stress responseprotein, LicA, MukF, RadA and those hypothetical proteins encoded byHI0094, HI1163, HI0665, HI1292, HI1064 HI186, HI0352 genes. NTHi OMPs,LOS and noncapsular proteins are also contemplated to elicit-an immuneresponse for prevention and treatment of disorders associated with NTHiinfection.

An “immunogenic dose” of a composition of the invention is one thatgenerates, after administration, a detectable humoral and/or cellularimmune response in comparison to the immune response detectable beforeadministration or in comparison to a standard immune response beforeadministration. The invention contemplates that the immune responseresulting from the methods may be protective and/or therapeutic.

The invention includes methods of blocking binding of NTHi bacteria tohost cells in an individual. The methods comprise administeringantibodies or polypeptides of the invention that block binding of NTHicellular attachment. Alternatively, administration of one or more smallmolecules that block binding of NTHi cell attachment is contemplated. Invitro assays may be used to demonstrate the ability of an antibody,polypeptide or small molecule of the invention to block NTHi cellattachment.

Pharmaceutical compositions comprising antibodies of the invention,polypeptides of the invention and/or small molecules of the inventionthat block NTHi cellular attachment are provided. The pharmaceuticalcompositions may consist of one of the foregoing active ingredientsalone, may comprise combinations of the foregoing active ingredients ormay comprise additional active ingredients used to treat bacterialinfections. The pharmaceutical compositions may comprise one or moreadditional ingredients such as pharmaceutically effective carriers.Dosage and frequency of the administration of the pharmaceuticalcompositions are determined by standard techniques and depend, forexample, on the weight and age of the individual, the route ofadministration, and the severity of symptoms. Administration of thepharmaceutical compositions may be by routes standard in the art, forexample, parenteral, intravenous, oral, buccal, nasal, pulmonary,rectal, or vaginal.

Also provided by the invention are methods for detecting NTHi infectionin an individual. In one embodiment, the methods comprise detecting NTHipolynucleotides of the invention in a sample using primers or probesthat specifically bind to the polynucleotides. Detection of thepolynucleotide may be accomplished by numerous techniques routine in theart involving, for example, hybridization and PCR.

The antibodies of the present invention may also be used to providereagents for use in diagnostic assays for the detection of NTHi antigens(NTHi polypeptides and peptides thereof) in various body fluids ofindividuals suspected of H. influenzae infection. In another embodiment,the NTHi proteins and peptides of the present invention may be used asantigens in immunoassays for the detection of NTHi in various patienttissues and body fluids including, but not limited to: blood, serum, earfluid, spinal fluid, sputum, urine, lymphatic fluid and cerebrospinalfluid. The antigens of the present invention may be used in anyimmunoassay system known in the art including, but not limited to:radioimmunoassays, ELISA assays, sandwich. assays, precipitin reactions,gel diffusion precipitin reactions, immunodiffusion assays,agglutination assays, fluorescent immunoassays, protein A immunoassaysand immunoelectrophoresis assays.

Vaccines and Chemotherapeutic Targets

An aspect of the invention relates to a method for inducing animmunological response in an individual, particularly a mammal whichcomprises inoculating the individual with a NTHi antigen protein or anantigenic peptide thereof.

The present invention also provides for vaccine formulations whichcomprise an immunogenic recombinant NTHi protein or NTHi peptide of theinvention together with a suitable carrier. The NTHi polypeptides andpeptides thereof contemplated as vaccine candidates and/or targets ofchemotherapy include, but are not limited to, histidine biosynthesisprotein, lipoprotein B, peptide ABC transporter, periplasmic SapAprecursor, outer membrane lipoproteins carrier protein precursor, ribosetransport system permease protein, phosphoribosylaminoimidazolecarboxylase catalytic subunit, PurE, 3,4-dihydroxt-2-butone 4-phosphatesynthase, ornithine carbamolytransferase, mannonate dehydratase,disulfide oxidoreductase, urease accessory protein, phospshocholinecytidylytransferase, putative pyridoxine biosynthesis protein, singletoxygen resistance protein, intracellular septation protein, DNA repairprotein, MUKF protein, glycerol-3-phosphate regulon repressor,integration host factor beta subunit, arginine repressor, cold shocklike protein, stress response protein, LicA, RadA and those hypotheticalproteins encoded by HI0094, HI1163, HI0665, HI1292, HI1064 HI1386,HI0352 genes, NTHi OMPs, NTHi LOS and NTHi noncapsular proteins andpolypeptides encoded by the novel NTHi polynucleotide sequences presentin the nucleotide sequences set out as SEQ ID NOS: 1-576, SEQ ID NOS:675-685 and the nucleotide.sequences set out in Table 3B, Table 4B andTable 5 herein, and the polypeptides having the amino acid sequences setout in Table 3B, Table 4B and Table 5 herein.

Since the protein may be broken down in the stomach, it is preferablyadministered parenterally, including, for-example, administration thatis subcutaneous, intramuscular, intravenous, or intradermal.Formulations suitable for parenteral administration include aqueous andnon-aqueous sterile injection solutions which may contain anti-oxidants,buffers, bacteriostats and solutes which render the formulation isotonicwith the bodily fluid, preferably the blood, of the individual; andaqueous and non-aqueous sterile suspensions which may include suspendingagents or thickening agents. The formulations may be presented inunit-dose or multi-dose containers, for example, sealed ampules andvials and may be stored in a freeze-dried condition requiring only theaddition of the sterile liquid carrier immediately prior to use. Thevaccine formulation may also include adjuvant systems for enhancing theimmunogenicity of the formulation, such as oil-in water systems andother systems known in the art. The dosage will depend on the specificactivity of the vaccine and can be readily determined by routineexperimentation.

A. Peptide Vaccines

Peptide therapeutic agents, such as peptide vaccines, are well known inthe art and are of increasing use in the pharmaceutical arts. Consistentdrawbacks to the parenteral administration of such peptide compoundshave been the rapidity of breakdown or denaturation. Infusion pumps, aswell as wax or oil implants, have been employed for chronicadministration of therapeutic agents in an effort to both prolong thepresence of peptide-like therapeutic agents and preserve the integrityof such agents. Furthermore, the peptide-like agent should (withparticular reference to each epitope of the peptide-like agent) ideallymaintain native state configuration for an extended period of time andadditionally be presented in a fashion suitable for triggering animmunogenic response in the challenged animal or immunized human.

The NTHi antigenic peptides of the invention can be prepared in a numberof conventional ways. The short peptides sequences can be prepared bychemical synthesis using standard means. Particularly convenient aresolid phase techniques (see, e.g., Erikson et al., The Proteins (1976)v. 2, Academic Press, New York, p. 255). Automated solid phasesynthesizers are commercially available. In addition, modifications inthe sequence are easily made by substitution, addition or omission ofappropriate residues. For example, a cysteine residue may be added atthe carboxy terminus to provide a sulfhydryl group for convenientlinkage to a carrier protein, or spacer elements, such as an additionalglycine residue, may be incorporated into the sequence between thelinking amino acid at the C-terminus and the remainder of the peptide.The short NTHi peptides can also be produced by recombinant techniques.The coding sequence for peptides of this length can easily besynthesized by chemical techniques, e.g., the phosphotriester methoddescribed in Matteucci et al., J. Am Chem Soc., 103: 3185 (1981).

Some of the NTHi peptide sequences contemplated herein may be consideredtoo small to be immunogenic, they may be linked to carrier substances inorder to confer this property upon them. Any method of creating suchlinkages known in the art may be used. Linkages can be formed withheterobifunctional agents that generate a disulfide link at onefunctional group end and a peptide link at the other, such as adisulfide amide forming agent, e.g., N-succidiinidyl-3-(2-pyridyldithio)proprionate (SPDP) (See, e.g., Jansen et al., Immun. Rev. 62:185, 1982)and bifunctional coupling agents that form a thioether rather than adisulfide linkage such as reactive esters of 6-maleimidocaproic acid,2-bromoacetic acid, 2-iodoacetic acid, 4-(N-maleimido-methyl)cyclohexane-1-carboxylic acid and the like, and coupling agent whichactivate carboxyl groups by combining them with succinimide or1-hydroxy-2-nitro-4-sulfonic acid, for sodium salt such as succinimmidyl4-(N-maleimido-methyl) cyclohexane-1-carobxylate (SMCC).

B. Vaccine Compositions and Administration

A priming dose of the immunogen that is followed by one or more boosterexposures to the immunogen may be necessary to be an effective vaccine(Kramp et al., Infect. Immun., 25: 771-773, 1979; Davis et al.,Immunology Letters, 14: 341-8 1986 1987). Examples of proteins orpolypeptides that could beneficially enhance the immune response ifco-administered include cytokines (e.g., IL-2, IL-12, GM-CSF),cytokine-inducing molecules (e.g. Leaf) or costimulatory molecules.Helper (HTL) epitopes could be joined to intracellular targeting signalsand expressed separately from the CTL epitopes. This would allowdirection of the HTL epitopes to a cell compartment different than theCTL epitopes. If required, this could facilitate: more efficient entryof HTL epitopes into the MHC class II pathway, thereby improving CTLinduction. In contrast to CTL induction, specifically decreasing theimmune response by co-expression of immunosuppressive molecules (e.g.TGF-β) may be beneficial in certain diseases.

Ideally, an immunogen will exhibit two properties; the capacity tostimulate the formnation of the corresponding antibodies and thepropensity to react-specifically with these antibodies. Immunogens bearone or more epitopes which are the smallest part of an immunogenrecognizable by the combing site of an antibody. In particularinstances, immunogen, fractions of immunogens or conditions under, whichthe immunogen is presented are inadequate to precipitate the desiredimmunological response resulting in insufficient immunity. This is oftenthe case with peptides or other small molecules used as immunogens.Otherlsubstances such as immunomodulators (e.g., cytokines such as theinterleukins) may be combined in vaccines as well.

The vaccine art recognizes the use of certain substances calledadjuvants to potentate an immune response when used in conjunction withan immunogen. Adjuvants are further used to elicit an immune responsethat is faster or greater than would be elicited without the use of theadjuvant. In addition, adjuvants may be used to create an immunologicalresponse using less immunogen than would be needed without the inclusionof adjuvant, to increase production of certain antibody subclasses thatafford immunological protection or to enhance components of the immuneresponse (e.g., humoral, cellular). Known adjuvants include emulsionssuch as Freund's Adjuvants and other oil emulsions, Bordetellapertussis, MF59, purified saponin from Quillaja saponaria (QS21),aluminum salts such as hydroxide, phosphate and alum, calcium phosphate,(and other metal salts), gels such as aluminum hydroxide salts,mycobacterial products including muramyl dipeptides, solid materials,particles such as liposomes and virosomes. Examples of natural andbacterial products known to be used as adjuvants include monophosphoryllipid A (MPL), RC-529 (synthetic MPL-like acylated monosaccharide),OM-174 which is a lipid A derivative from E. coli, holotoxins such ascholera toxin (CT) or one of its derivatives, pertussis toxin (PT) andheat-labile toxin (LT) of E. coli or one of its derivatives, and CpGoligonucleotides. Adjuvant activity can be affected by a number offactors, such as carrier effect, depot formation, altered lymphocyterecirculation, stimulation of T-lymphocytes, direct stimulation ofB-lymphocytes and stimulation of macrophages.

Vaccines are typically prepared as injectables, either as liquidsolutions or suspensions; solid forms suitable for solution in, orsuspension in, liquid prior to injection may also be prepared. Thepreparation may also-be emulsified. The active immunogenic ingredient isoften mixed with excipients, which are pharmaceutically acceptable andcompatible with the active ingredient. Suitable excipients are, e.g.,water, saline, dextrose, glycerol, ethanol, or the like and combinationsthereof. In addition, if desired, the vaccine may contain minor amountsof auxiliary substances such as wetting or emulsifying agents, pHbuffering agents, or adjuvants, which enhance the effectiveness of thevaccine. The vaccines are conventionally administered parenterally, byinjection, for example, either subcutaneously or intramuscularly.Additional formulations which are suitable for other modes ofadministration include suppositories and, in some cases, oralformulations. For suppositories, traditional binders and carriers mayinclude, for example, polyalkalene glycols or triglycerides; suchsuppositories may be formed from mixtures containing the activeingredient in the range of 0.5% to 10%, preferably 1-2%. Oralformulations include such normally employed excipients as, for example,pharmaceutical grades of mannitol, lactose, starch, magnesium stearate,sodium saccharine, cellulose, magnesium carbonate and the like. Thesecompositions take the form of solutions, suspensions, tablets, pills,capsules, sustained release formulations or powders and contain 10%-95%of active ingredient, preferably 25-70%.

Vaccines may also be administered through transdermal routes utilizingjet injectors, microneedles, electroporation, sonoporation,microencapsulation, polymers or liposomes, transmucosal routes andintranasal routes using nebulizers, aerosols and nasal sprays.Microencapsulation using natural or synthetic polymers such as starch,alginate and chitosan, D-poly L-lactate (PLA), D-polyDL-lactic-coglycolic microspheres, polycaprolactones, polyorthoesters,polyanhydrides and polyphosphazenes polyphosphatazanes are useful forboth transdermal and transmucosal administration. Polymeric complexescomprising synthetic poly-omithate, poly-lysine and poly-arginine oramphipathic peptides are useful for transdermal delivery systems. Inaddition, due to their amphipathic nature, liposomes are contemplatedfor transdermal, transmucosal and intranasal vaccine delivery systems.Common lipids used for vaccine delivery includeN-(1)2,3-(dioleyl-dihydroxypropyl)-N,N,N, -trimethylammonium-methylsulfate (DOTAP), dioleyloxy-propyl -trimethylammonium chloride DOTMA,dimystyloxypropyl-3-dimethyl-hydroxyethyl ammonium (DMRIE),dimethyldioctadecyl ammonium bromide (DDAB) and9N(N′,N-dimethylaminoethane) carbamoyl) cholesterol (DC-Chol). Thecombination of helper lipids and liposomes will enhance up-take of theliposomes through the skin. These helper lipids include, dioleoylphosphatidylethanolamine (DOPE), dilauroylphosphatidylethanolamine(DLPE), dimyristoyl phosphatidylethanolamine (DMPE),dipalmitoylphosphatidylethanolamine (DPPE). In addition, triterpenoidglycosides or saponins derived from the Chilean soap tree bark (Quillajasaponaria) and chitosan (deacetylated chitan) have been contemplated asuseful adjuvants for intranasal and transmucosal vaccine delivery.

The proteins may be formulated into the vaccine as neutral or saltforms. Pharmaceutically acceptable salts, include the acid additionsalts (formed with the free amino groups of the peptide) and which areformed with inorganic acids such as, e.g., hydrochloric or phosphoricacids, or such organic acids as acetic, oxalic, tartaric, mandelic.Salts formed with the free carboxyl groups may also be derived frominorganic bases such as, e.g., sodium, potassium, ammonium, calcium, orferric hydroxides, and such organic bases as isopropylamine,trimethylamine, 2-ethylamino ethanol, histidine, and procaine.

The vaccines are administered in a manner compatible with the dosageformulation, and in such amount as will be therapeutically effective andimmunogenic. The quantity to be administered depends on the subject tobe treated, capacity of the subject's immune system to synthesizeantibodies, and the degree of protection desired. Precise amounts ofactive ingredient required to be administered depend on the judgment ofthe practitioner and are peculiar to each individual. However, suitabledosage ranges are of the order of several hundred micrograms activeingredient per individual. Suitable regimes for initial administrationand booster shots are also variable, but are typified by an initialadministration followed in one or three month intervals by a subsequentinjection or other administration.

Upon immunization with a vaccine composition as described herein, theimmune system of the host responds to the vaccine by producing largeamounts of CTLs specific for the desired antigen, and the host becomesat least partially immune to later infection, or resistant to developingchronic infection. Vaccine compositions containing the NTHi poly eptideor NTHi peptides of the invention are administered to a patientsusceptible to or otherwise at risk of bacterial infection to elicit animmune response against the antigen and thus enhance the patient's ownimmune response capabilities. Such an amount is defined to be an“immunogenically effective dose.” In this use, the precise amounts againdepend on the patient's state of health and weight, the mode ofadministration, the nature of the formulation, etc., but generally rangefrom about 1.0 μg to about 5000 per 70 kilogram patient, more commonlyfrom about 10 to about 500 mg per 70 kg of body weight. For therapeuticor immunization purposes, the NTHi polypeptide or NTHi peptides of theinvention can also be expressed by attenuated viral hosts, such asvaccinia or fowlpox. This approach involves the use of vaccinia virus asa vector to express nucleotide sequences that encode the peptides of theinvention. Upon introduction into an acutely or chronically infectedhost or into a noninfected host, the recombinant vaccinia virusexpresses the immunogenic peptide, and thereby elicits a host CTLresponse.

Humoral immune response may be measured by many well known methods, suchas Single Radial Immunodiffussion Assay (SRID), Enzyme Immunoassay (EIA)and Hemagglutination Inhibition Assay (HAI). In particular, SRIDutilizes a layer of a gel, such as agarose, containing the immunogenbeing tested. A well is cut in the gel and the serum being tested isplaced in the well. Diffusion of the antibody out into the gel leads tothe formation of a precipitation ring whose area is proportional to theconcentration of the antibody in the serum being tested. EIA, also knownas ELISA (Enzyme Linked Immunoassay), is used to determine totalantibodies in the sample. The immunogen is adsorbed to the surface of amicrotiter plate. The test serum is exposed to the plate followed by anenzyme linked immunoglobulin, such as IgG. The enzyme activity adherentto the plate is quantified by any convenient means such asspectrophotometry and is proportional to the concentration of antibodydirected against the immunogen present in the test sample. HAI utilizesthe capability of an immunogen such as viral proteins to agglutinatechicken red blood cells (or the like). The assay detects neutralizingantibodies, i.e., those antibodies able to inhibit hemagglutination.Dilution of the test serum are incubated with a standard concentrationof immunogen, followed by the addition of the red blood cells. Thepresence of neutralizing antibodies will inhibit the agglutination ofthe red blood cells by the immunogen. Tests to measure cellular immuneresponse include determination of delayed-type hypersensitivity ormeasuring the proliferative response of lymphocytes to target immunogen.

Nontypeable Haemophilus influenzae (NTHi)

H. influenzae is a small, nonmotile gram negative bacterium. Unlikeother H. influenzae strains, the nontypeable H. influenzae (NTHi)strains lack a polysaccharide capsule and are sometimes denoted as“nonencapsulated.” NTHi strains are genetically distinct fromencapsulated strains and are more heterogenous than the type b H.influenzae isolates. NTHi presents a complex array of antigens to thehuman host. Possible antigens that may elicit protection include OMPs,lipopolysaccharides, lipoproteins, adhesion proteins and noncapsularproteins.

Humans are the only host for H. influenze. NTHi strains commonly residein the upper respiratory tract including the nasopharynx and theposterior oropharynx, the lower respiratory tract and the female genitaltract. NTHi causes a broad spectrum of diseases in humans, including butnot limited to, otitis media, pneumonia, sinusitis, septicemia,endocarditis, epiglottitis, septic arthritis, meningitis, postpartum andneonatal infections, postpartum and neonatal sepsis, acute and chromicsalpingitis, epiglottis, pericarditis, cellulitis, osteomyelitis,endocarditis, cholecystitis, intraabdominal infections, urinary tractinfection, mastoiditis, aortic graft infection, conjunctitivitis,Brazilian purpuric fever, occult bacteremia and exacerbation ofunderlying lung diseases such as chronic bronchitis, bronchietasis andcystic fibrosis.

Epidemiologic studies of NTHi have indicated that the strains areheterogeneous with respect to outer membrane protein profiles (Barenkampet al., Infect. Immun., 36: 535-40, 1982), enzyme allotypes (Musser etal., Infect. Immun., 52: 183-191, 1986), and other commonly usedepidemiologic tools. There have been several attempts to subtype NTHi,but none of the methodologies have been totally satisfactory. Theouter-membrane protein composition of NTHi consists of approximately 20proteins. All NTHi strains contains two comnnon OMP's with molecularweights of 30,000 and 16,600 daltons. NTHi strains may be subtyped basedon two OMP's within the 32,000-42,000 dalton range. The NTHiliposaccharide profile is fundamentally different than the enteric gramnegative bacteria and separates into 1-4 distinct bands ranging fromless than 20,000 daltons.

A prototype NTHi isolate is the low passage isolate 86-028NP which wasrecovered from a child with chronic otitis media. This strain has beenwell characterized in vitro (Bakaletz et al., Infect. Immun., 53: 331-5,1988; Holmes et al., Microb. Pathog., 23: 157-66, 1997) as well as inthe chinchilla OM model (described herein) (Bakaletz et al., Vaccine,15: 955-61, 1997; Suzuki et al., Infect. Immun., 62: 1710-8, 1994;DeMaria et al., Infect. Immun., 64: 5187-92, 1996). The 86-028NP strainwas used, as described herein, to identify genes that are up-regulatedin expression in the chinchilla model of otitis media and genes that arenecessary for NTHi survival in the chinchilla middle ear.

DFI Strategy

A differential fluorescence induction (DFI) strategy was used herein toidentify NTHi genes induced during OM in a chinchilla animal model.Several methods have been developed to identify bacterial genes thatcontribute to the virulence of an organism during infection. Suchmethods include in vivo expression technology (IVET) in which bacterialpromoters regulate the expression of gene(s) required for synthesis ofessential nutrients required for survival in the host; signature-taggedmutagenesis (STM) enabling tag-specific identification of genes thatalter the virulence properties of a microorganism when mutated; DNAmicroarray technology to globally screen for transcriptionally activegenes, and DFI which uses Fluorescent Activated Cell Sorting (FACS)analysis to select for transcriptionally active promoters (Chiang etal., Annu. Rev. Microbiol., 53: 129-154, 1999). DFI is a high-throughputmethod that allows for the identification of differentially regulatedgenes regardless of the basal level of expression and does not excludethose that are essential for growth in vitro.

DFI has been successfully utilized in many microorganisms. For example,a GFP reporter system and flow cytometry was used to study mycobacterialgene expression upon interaction with macrophages (Dhandayuthapani etal., Mol. Microbiol., 17: 901-912, 1995). A promoter trap system wasused to identify genes whose transcription was increased whenSalmonellae were subjected to environments simulating in vivo growth andwhen internalized by-cultured macrophage-like cells (Valdivia andFalkow, Mol. Microbiol., 22: 367-378, 1996; Valdivia and Falkow,Science, 277: 2007-2011, 1997; Valdivia and Falkow, Curr. Opin.Microbiol., 1: 359-363, 1998). In addition, DFI has been used toidentify promoters expressed in S. pneumoniae and S. aureus when grownunder varied in vitro conditions simulating infection (Marra et al.,Infect. Immun., 70(3): 1422-1433, 2002; Schneider et al., Proc. Natl.Acad. Sci. USA., 97: 1671-1676; 2000). In addition, DFI has beenutilized to study gene regulation in Bacillus cereus in response toenvironmental stimuli (Dunn and Handelsman, Gene, 226: 297-305, 1999),in S. pneumoniae in response to a., competence stimulatory peptide(Bartilson et al., Mol. Microbiol., 39: 126-135, 2001), and uponinteraction with and invasion of host cells in Bartonella henselae Leeand Falkow, Infect. Immun., 66: 3964-3967, 1998), Listeria monocytogenesWilson et al., Infect. Immun., 69: 5016-5024, 2001), Brucella abortus(Eskra et al., Infect. Immun., 69: 7736-7742, 2001), and Escherichiacoli (Badger et al., Mol. Microbiol., 36: 174-182, 2000).

Whereas DFI has been successfully used to identify promoters active incell culture models of infection or in vitro conditions designed tosimulate an in vivo environment, few have applied DFI to identifypromoters regulated in a specific biological niche within the wholeanimal. This is likely due to the numerous challenges associated withsorting from an in vivo environment. The host inflammatory response,dissemination and/or clearance of bacterial cells from the site ofinfection, as well as adherence of bacteria to epithelial cells,possibly via biofilm formation, can make bacteria inaccessible forretrieval from the living animal. These factors, among others,contribute to the complexity of the microenvironment and theheterogeneity of gene expression as the bacteria sense and respond tothese changes. Recently, DFI has been used to identify promotersexpressed in S. pneumoniae when the bacteria were screened in a mousemodel of respiratory tract infection and a gerbil infection model of OM(Marra et al., Infect. Immun. 70: 1422-33, 2002; Marra et al.,Microbiol., 148: 1483-91, 2002).

Animal Model

The chinchilla model is a widely accepted experimental model for OM.

In particular, a chinchilla model of NTHi-induced OM has been wellcharacterized (Bakaletz et al., J. Infect. Dis., 168: 865-872, 1993;Bakaletz and Holmes, Clin. Diagn. Lab. Immunol.,4: 223-225, 1997; Suzukiand Bakaletz, Infect. Immun., 62: 1710-1718, 1994), and has been used todetermine the protective efficacy of several NTHi outer membraneproteins, combinations of outer membrane proteins, chimeric syntheticpeptide vaccine components, and adjuvant formulations as vaccinogensagainst OM (Bakaletz et al., Vaccine, 15: 955-961, 1997; Bakaletz etal., Infect. Immun., 67: 2746-2762, 1999; Kennedy et al., Infect.Immun., 68: 2756-2765, 2000).

In particular, there is an unique in vivo model wherein adenoviruspredisposes chinchillas to H. influenzae-induced otitis media, whichallowed for the establishment of relevant cell, tissue and organ culturesystems for the biological assessment of NTHi (Bakaletz et al., J.Infect. Dis., 168: 865-72, 1993; Suzuki et al., Infect. Immunity 62:1710-8, 1994). Adenovirus infection alone has been used to assess forthe transudation of induced serum antibodies into the tympanum (Bakaletzet al., Clin. Diagnostic Lab Immunol., 4(2): 223-5, 1997) and has beenused as a co-pathogen with NTHi, to determine the protective efficacy ofseveral active and passive immunization regimens targeting various NTHiouter membrane proteins, combinations of OMPs, chimeric syntheticpeptide vaccine components, and adjuvant formulations as vaccinogensagainst otitis media (Bakaletz et al., Infect Immunity, 67(6): 2746-62,1999; Kennedy et al., Infect Immun., 68(5): 2756-65, 2000; Novotny etal., Infect Immunity 68(4): 2119-28, 2000; Poolman et al., Vaccine 19(Suppl. 1): S108-15, 2000).

Genes Upregulated In vivo in Response to NTHi Infection of the MiddleEar

In order to identify differentially regulated promoters in response toNTHi infection of the middle ear, a promoter trap library wasconstructed and sorting parameters defined. A portion of the promotertrap library was inoculated directly into the chinchilla middle ear andOM development was monitored by video otoscopy and tympanometry at 24and 48 hours. In addition, the middle ear fluids were recovered 24 and48 hours after infection. Two-color FACS analysis was used to isolatedbacteria that were expressing GFP from other cells and debris associatedwith the effusion. Following isolation, DNA sequence of the Haemophilusinserts 5′ of the gfpmut3 gene were determined and analyzed. In thismanner, we identified genes that are up-regulated as NTHi sense andrespond to the environment of the chinchilla middle ear during AOM. Thefollowing genes were identified and due to their up-regulation duringNTHi infection, they may play a role in NTHi infection and virulence.

As described below in Example 7, following the DFI proceduredescribed-above and subsequent FACS analysis of gfp-expressing clones,52 candidate clones containing potential in vivo-regulated promoterswere isolated. The genes these clones control were categorized basedupon general description and function within the cell and includegeneral metabolic processes, environmental informational processing andmembrane transport, membrane proteins and hypothetical proteins. Eightof these 52 clones contain sequences that are unique to NTHi strain86-028NP. Importantly, 3 clones were isolated from independent screensin more than one animal thereby verifying the method of isolation.

In order to independently confirm the FACS datai we determined therelative expression of candidate genes by quantitative ReverseTranscriptase-Polymerase Chain Reaction (RT-PCR). The parent strain86-028NP, was used for these studies. Thus, wild-type gene expressionwithout the influence of plasmid copy number on gene regulation wasanalyzed, allowing for the indication of false-positive cloneidentification by FACS. Of the 44 candidate clones containing sequencesimilar to that identified in H. influenzae strain Rd, quantitativecomparison of gene expression in vitro and in vivo confirmedup-regulated gene expression for twenty-six genes (60%) when NTHirespond to environmental cues present in the chinchilla middle ear. Thisanalysis identified in vivo-regulated promoters which drive expressionof genes involved in membrane transport, environmental informationalprocessing, cellular metabolism, gene regulation, as well ashypothetical proteins with unknown function. (See Table 4 in Example 6).

Quantitative RT-PCR demonstrated a two-fold increase in lola expression,enabling lipoprotein transport from the inner membrane to the outermembrane. Bacteria grow rapidly in the middle ear environment reaching5.0×10⁸ CFU NTHi ml middle ear fluid within 48 hours. The bacteria senseand respond to the environment, acquiring or synthesizing the necessarynutrients for growth,and survival. The gene encoding the membranecomponent in ribose sugar transport, rbsC (SEQ ID NO: 619), showed a5-fold increase in expression in vivo compared to cells growing invitro. In addition, many genes involved in metabolic processes show adramatic increase in gene expression in vivo compared to cells growingin vitro. These include a riboflavin synthesis gene, ribB (SEQ ID NO:623), a purine nucleotide biosynthetic gene purE (SEQ ID NO: 621),omithine carbamoyltransferase, arcB (SEQ ID NO: 625), involved inarginine degradation via the urea cycle and uxuA (SEQ ID NO: 627),encoding mannonate hydrolase, required for the uptake of D-glucuronateand transformation into glyceraldehyde 3-phosphate. In addition, but toa lesser degree, genes for histidine biosynthesis (hisB; SEQ ID NO:615), DNA repair (radC; SEQ ID NO: 639) and a putative intracellularseptation transmembrane protein (ispZ; SEQ ID NO: 637) wereup-regulated.

Disulfide bond formation is important for folding and assembly of manysecreted proteins in bacteria. In prokaryotes, DsbA and DsbB make up theoxidative pathway responsible for the formation of disulfides. DsbBreoxidizes DsbA, which donates disulfide bonds directly to unfoldedpolypeptides, and DsbB has been demonstrated to generate disulfides denovo from oxidized quinones (Collet and Bardwell, Mol. Microbiol., 44:1-8, 2002). In H. influenzae strain Rd, DsbA is required for competencefor transformation (Tomb, Proc. Natl. Acad. Sci. U.S.A., 89:10252-10256, 1992). Herein, an approximate 3-fold increase in dsbB gene(SEQ ID NO: 629) transcription was demonstrated, illuminating animportant role for disulfide interchange for NTHi growing in the middleear environment.

Bacteria colonization of the middle ear, a normally sterile environment,results in a host inflammatory response and subsequent neutrophilinfiltration. Bacteria have evolved numerous strategies to combat thishost response. NTHi increase gene expression (4-fold) of ureH (SEQID-NO:631), a homologue of a gene required for expression of activeurease in Helicobacter, shown to be involved in acid tolerance (Young etal., J. Bacterol., 178: 6487-6495, 1996). Recently, it has been reportedthat urease activity may play a role in chronic Actinobacilluspleuropneumoniae infection by counteracting the decrease in pH occurringupon infection (Baltes et al., Infect. Immun., 69: 472-478, 2000; Balteset al., Infect. Immun., 69: 472-478, 2001; Bosse and MacInnes, Can. J.Vet. Res., 64: 145-150). A biotype analysis on NTHi isolates from middleear effusions demonstrated that 87% are urease positive (DeMaria et al.,J. Clin. Microbiol., 20: 1102-1104, 1984). However, the role of ureasein NTHi virulence is unknown. Similarly, an increase in expression of agene whose product demonstrates 88% sequence identity to a pyridoxinebiosynthesis protein in S. pneumoniae and 60% homology to a putativesinglet oxygen resistance protein that may function as an antioxidant.Phosphorylcholine (ChoP) has been implicated in the pathogenesis of NTHi(Weiser et al., Infect. Immun., 65: 943-950, 1997). NTHi modulates ChoPexpression by phase variation, decorating the LOS on the cell surface.ChoP may contribute to NTHi persistence in the respiratory tract viadecreased. susceptibility to antimicrobial peptides (Lysecko et al.,Infect. Immun., 68: 1664-1671, 2000) and alter the sensitivity to serumkilling mediated by C-reactive protein (CRP) (Weiser et al., J. Exp.Med., 187: 631-640, 1998). The microenvironment of the nasopharynx andmiddle ear cavity may select for the ChoP⁺ phenotype, as ChoP⁺ strainsshow greater colonization of the chinchilla nasopharynx (Tong et al.,Infect. Immun., 68: 4593-4597, 2000). Expression of the licC gene (SEQID NO: 633) was also increased. The licC gene encodes aphosphorylcholine cytidylyltransferase that plays. a role in thebiosynthesis of phosphorylcholine-derivatized LOS (Rock et al., J.Bacterol., 183: 4927-4931, 2001).

Also included among the in vivo-induced genes is a set whose productssubsequently regulate gene expression or DNA replication. These genes.include transcriptional regulation of glycerol metabolism by the glprepressor, glpR (SEQ ID NO: 643), the arginine repressor gene, argR (SEQID NO: 647), and the integration host factor (IHF) beta subunit, ihfB(SEQ ID NO: 645). IHF is a histone-like protein that binds DNA atspecific sequences, an accessory factor involved in replication,site-specific recombination and transcription, altering the activity ofa large number of operons (Goosen and van de Putte, Mol. Microbiol. 16:1-7, 1995). In addition, CspD inhibits DNA replication during stationaryphase-induced stress response in E. coli (Yamanaka et al., Mol.Microbiol., 39: 1572-1584, 2001) and the mukF (SEQ ID NO: 641) geneprotein homologue contributes to a remodeling of the nucleiod structureinto a more compact form prior to cell segregation (Sawitzke and Austin,Proc. Natl. Acad. Sci. U.S.A., 62: 1710-1718, 2000). The DFI strategydescribed herein also identified promoters induced in vivo for genes ofunknown function. The. hypothetical protein, HI0094, demonstrated an8-fold increase in gene expression during early OM but its role remainsunknown. HI1163 (SEQ ID NO: 651) showed 58% amino acid identity with thehypothetical YdiJ proteins, a putative oxidase, of E. coli.

A high-density transposon mutagenesis strategy was used to identify H.influenzae genes essential for growth on rich medium (Akerley et al.,Proc. Natl. Acad. Sci. U.S.A., 99: 966-971, 2002). Six genes wereidentified in the screen described herein that are included in essentialgene set described in Akerley' et al., supra. (hisB, lppB, lolA, ispZ,mukF and unknown HI0665). Recently genes of non-typeable H. influenzaethat are expressed upon interaction with two human respiratorytract-derived epithelial cell lines have been identified. These genesincluded those involved in metabolic processes, stress responses, geneexpression, cell envelope biosynthesis, DNA-related processes, celldivision and ORF's encoding proteins of unknown function. (Ulsen et al.,Mol. Microbiol., 45:485-500,2002). Similarly the stress response gene,cspD (SEQ ID NO: 649), genes involved in purine, and riboflavinbiosynthesis, and a protein of unknown function, vapA was identified inthe screen described herein. Expression of vapA was detected in vitro,yet vapA gene expression increased two-fold in vivo. These uniqueapproaches identified known genes that are upregulated in NTHi-inducedOM and therefore are likely to play a role in NTHi infection andvirulence; and may be potential candidates for vaccines and antisensetherapies and other therapeutic methods of treatment of NTHi relateddisorders.

The DFI strategy resulted in the identification of promoters induced invivo for genes of unknown function as well. The hypothetical protein,HI0094, demonstrated a 8-fold increase in gene expression during earlyOM but its role remains unknown. HI1163 (SEQ ID NO: 651) showed 58%amino acid identity with the hypothetical YdiJ proteins, a putativeoxidase, of E. coli. Therefore, these hypothetical genes are likely toplay a role in OM induced by NTHi infection.

BRIEF DESCRIPTION OF FIGURES

FIG. 1 depicts the LKP gene region in a panel of Haemophilus isolates.The strain 86-028NP sequence is identical in this region to the sequencein NTHi strain R3001. Both of these NTHi lack the hifgene clusterencoding the hemagglutinating pilus.

FIG. 2 depicts the rfaD region in a panel of Haemophilis isolates. Thegene arrangement in the rfaD region of the strain 86-028NP genome issimilar to that of the strain Rd genome but different than thearrangement of these genes seen in the genome of most NTHi examined.

FIGS. 3A-3M set out the nucleotide'sequences (SEQ ID NOS: 589-614)described in Table 4, which were identified to be upregulated during OMinfection (see Example 6). The nucleotides (nt.) which correspond toknown genes and those nt. which co.rrespond to the contig sequences setout as SEQ ID NO: 1-576 are also presented.

DETAILED DESCRIPTION

The following examples illustrate the invention wherein Example 1describes the sequence of the NTHi genome, Example 2 describes theidentified contigs and initial gene discovery, Example 3 describesconstruction of the NTHi. promoter trap library, Example 4 describes theanalyses of 86-028NP derivatives expressing GFP, Example 5 demonstratesdirect labelling of bacteria from middle ear fluids, Example 6 describesidentification of promoters induced in vivo in acute otitis media,Example 7 describes identification of virulence-associated genes, andExample 8 describes identification of unique NTHi gene sequences.

EXAMPLE 1 Sequence of a Non-Typeable Haemophilus influenzae Genome

NTHi strain 86-028NP is a minimally passaged clinical isolate obtainedfrom a pediatric patient who underwent tympanostomy and tube insertionfor chronic OM at Columbus Children's Hospital. (Bakaletz et al.Infection and Immunity, 56(2): 331-335, 1988) The 86-028NP strain wasdeposited with the American Type Tissue Collection (Manassas, Va. 20108USA) on Oct. 16, 2002 and assigned accession no. PTA-4764.

In an effort to more broadly approach the identification of thevirulence determinants in NTHi, the genome of the NTHi 86-028NP strainwas sequenced to 3-fold coverage. Chromosomal DNA was prepared fromstrain 86-028NP using the Puregene protocol and sheared to 2-4 kb insize with a Hydroshear instrument (Gene Machines). The sheared DNA wasethanol-precipitated, end-repaired using a mixture of Klenow enzyme andT4 DNA polymerase, and size-selected by agarose gel electrophoresis toobtain 2-4 kb fragments as described in Chissoe et al. (Methods: aCompanion to Methods of Enzymology 3: 55-65, 1991) and Sambrook et al.(Molecular Cloning: a Laboratory Manual, 2^(nd) Ed. Cold Springs Harbor,N.Y., 1989). These fragments were cloned into vector pUC 18 using theSmaI restriction site (phosphatase-treated) and transformed into E. coliXL-1 Blue, selecting for ampicillin resistance. Colonies that containinserts were identified by blue/white screening on LB-Amp platescontaining X-gal, and transferred into 96-deep well plates containing1.5 ml of TB-Amp (TB=Terrific Broth) broth. The deep-well plate cultureswere grown overnight (18-22 hours) at 37° C. Template preparation,sequencing and contig assembly were performed.

Automated template preparation was performed on the Beckman Biomek 2000automated robotics workstation as described in Chissoe et al., (supra.)Briefly, each 96-deep well plate, containing the clones prepared above,was centrifuged to pellet the cells, the supematant-decanted, and thecells frozen (if necessary) at −20° C. Four 96-deep well blocks wereplaced on the Biomek table and the liquid handling robot was used toprepare the template using an automated version of a typical SDS-NaOHlysis protocol as described in Chissoe et al., (supra.). The finalethanol-precipitated templates were each dissolved in 50 μl ddH₂O, andused for DNA sequencing.

Sequencing reactions were run by re-arraying the templates (from 96-wellplates) into 384-well plates, using the Robbins Hydra 96 robot.Cycle-sequencing reactions were run using PE Big-Dye™ terminators anduniversal primers (M13 forward and reverse), cleaned up over SephadexG50 columns, and analyzed on a PE Biosystems 3700 capillaryelectrophoresis DNA sequencer according to the manufacturer'sinstructions. Sequencing reads (8219) were assembled into 576 contigs(SEQ ID NOS: 1-576 herein). The statistics for the 3-fold sequencing areshown in Table 2A. The total unique sequence in assembly 17 is 1.74 Mb.

Contig Size Total Number Total Length % of Cumulative 0–1 kb 65 559613.2% 1–2 kb 228 333665 19.2%  2–3 kb 101 243059 14.0%  3–4 kb 49 1723859.9% 4–5 kb 45 196699 11.3%  5–10 kb 74 515152 29.6%  10–20 kb 11 1445918.3% 20–30 kb 3 77352 4.4%

Subsequently, 8-fold sequencing analysis of the NTHi genome was carriedout. The 8-fold sequencing assembled the NTHi genome into 11 contigs.Contigs 5, 8, 9, 10, 12-18 are denoted as SEQ ID NOS: 675-685 herein.The statistics for the 8-fold sequencing are shown in Table 2B.

TABLE 2B Contig Size Total Number Total Length % of Cumulative 0–1 kb 53950 0.2% 1–2 kb 3 4316 0.2% 2–3 kb 0 0 0.0% 3–4 kb 1 3964 0.2% 4–5 kb 00 0.0% 5–10 kb 0 0 0.0% 10–20 kb 1 15147 0.8% 20–30 kb 2 51888 2.7%30–40 kb 0 0 0.0% 40–50 kb 0 0 0.0% 50–100 kb 1 85814 4.5% >100 kb 51760339 91.4% 

EXAMPLE 2 Contig Description and Initial Gene Discovery

Seventy-five of the 88 contigs with length ≧5000bp, identified with the3-fold sequence analysis, show significant similarity via BLASTN togenes in H. influenzae strain Rd. To visualize the potentialrelationship between the gene order in H. influenzae strain 86-028NP andH. influenzae strain Rd, the 86-028NP three-fold contig set and the Rdgene set were bidirectionally compared using BLASTN. The results wereplotted in gene-order verses contig space by sorting the contigs basedon gene coordinates of the Rd genes hit, anchoring each contig at thesmallest coordinate found as described in Ray et al., (Bioinformatics17: 1105-12, 2001). Compared in this fashion, an incomplete assembly ofa genome with identical gene order to a completely known genome woulddisplay a monotonically increasing stair-stepped form.

BLASTX was used to identify hits to sequences with homology to genes inthe strain Rd genome as well as genes not found in H. influenzae strainRd. Hits to strain Rd sequences were removed from the data set and theother bits summarized in Table 3A. The data are presented as follows:contig #(=SEQ ID NO: #), column 1; E score for each hit, column 2; thename of the protein that had homology to a portion of the amino acidtranslation of the cited contig, column 3; the organism producing thehomologue, column 4; and the Genbank protein identifier for each of theproteins cited in column 3, column 5; the corresponding nucleotideswithin the contig (referenced by SEQ n) NO:). In most instances, severalhomologues were identified but for clarity, the protein of greatesthomology is cited in Table 3A.

The sequences for some of the genes listed in Table 3A were identifiedwithin the 8-fold sequencing of the NTHi genome. Table 3B lists thelocation of these genes within the 11 contigs, the flill length openreading frame sequence (identified by SEQ ID NO:), the derived aminoacid sequence encoded by the open reading frame and the gene with highhomology identified by BLASTX (as listed in Table 3A).

To examine the relative short range gene arrangements in NTHi and the Rdstrain, the gene order in two gene clusters that have beenwell-described were compared. First, the genes present in thehemagglutinating pilus (LKP) gene region were examined.(Mhlanga-Mutangadura et al., J. Bacteriol. 180(17): 4693-703, 1998). Thepilus gene cluster is located between the purE and pepN genes, onlyfragments of which are depicted in FIG. 1. The serotype b strain, Eagan,contains the hifABCDE gene cluster and produces hemagglutinating pili.Strain Rd lacks the hicAB genes as well as the hifABCDE gene cluster. Ingeneral, the nontypeable strains previously examined contained the hicABgenes but not the hif genes that encode the hemagglutinating pilus. Thestrain 86-028NP sequence (described herein) is identical in this regionto the sequence in NTHi strain.R3001 (FIG. 1). The rfad gene encodes anenzyme involved in the biosynthesis of endotoxin. In addition, the rfaDgene from NTHi strain 2019 has been characterized by Nichols et al.(Infect Immunity 65(4): 1377-86, 1997). In strain 2019, the rfaD gene isimmediately upstream of the rfaF gene that encodes anotherenzymeinvolved in endotoxin biosynthesis. The gene arrangement in strainRd is different; the rfaD and rfaF genes are separated by approximately11 kb of sequence. Most nontypeable strains examined contained the genearrangement seen in strain 2019. In contrast, strain 86-028NP has a genearrangement identical to that seen in strain Rd (FIG. 2).

A global analysis of the current assembly indicates that the genecontent and order are similar to that in strain Rd. A more detailedanalysis revealed that there are a substantial number of NTHi genes notpreviously seen in the Pasteurellaceae and some regions where the NTHigene content and order is different than that seen in strain Rd. Thus,the current data suggest that the strain 86-028NP genome will contain acomplex mosaic of Rd and non-Rd like features.

The DFI strategy also identified novel NTHi sequences that had increasedgene expression. A list of these novel contig sequences that containgenes or gene fragments that have homology to ORFs in other organisms(primarily gram-negative bacteria) is set out in Table 3A. For example,the nucleotide sequence of contig 442 (SEQ ID NO: 442), nucleotides1498-1845 are highly homologous to the sequences encoding amino acids1-116 of H. influenzae strain Rd lipoprotein B (LppB). The gene ispositioned between the stationary phase survival gene, surE, and a geneencoding a 43 kD antigenic outer membrane lipoprotein that is highlyhomologous to the recently identified bacterial lipoprotein, LppB/NlpD,which has been associated with virulence (Padmalayam et al., Infect.Immun., 68: 4972-4979, 2000). Recently, Zhang and coworkers demonstratedthat nlpD and surE gene expression was induced during stationary phaseof bacterial growth in Thermotoga maritima (Zhang et al., Structure(Camb), 9: 1095-1106, 2001). Therefore, under stress-induced conditionsin the middle ear, this NTHi lipoprotein may be expressed.

TABLE 3A Genbank Contig E score Hit Identity Organism Protein SEQ ID NO:104 4.00E−59 CpdB Pasteurella NP_246953.1 nt. 204–659 of multocida SEQID NO: 104 106 9.00E−10 hypothetical protein Pyrococcus G71244 nt.40–309 of PH0217 - horikoshii SEQ ID NO: 106 106 1.00E−08 unknownPasteurella NP_246871.1 nt. 605–694 of multocida SEQ ID NO: 106 1062.00E−20 Orf122 Chlorobium AAG12204.1 nt. 7–210 of tepidum SEQ ID NO:106 110 3.00E−05 ArdC antirestriction IncW plasmid pSa AAD52160.1compliment of protein nt. 959–1162 of SEQ ID NO: 110 110 1.00E−33hypothetical protein Salmonella NP_458676.1 compliment of entericasubsp. nt. nt. 181–825 enterica serovar of SEQ ID NO: Typhi 110 1115.00E−12 putative membrane Salmonella NP_458664.1 compliment of proteinenterica subsp. nt. 45–287 of enterica serovar SEQ ID NO: 111 Typhi 1116.00E−41 hypothetical protein Salmonella NP_458658.1 compliment ofenterica subsp. nt. 1091–1480 of enterica serovar SEQ ID NO: 111 Typhi114 7.00E−80 unknown Pasteurella NP_245828.1 compliment of multocida nt.118–696 of SEQ ID NO: 114 115 2.00E−09 A111R Paramecium NP_048459.1 nt.555–869 of bursaria Chlorella SEQ ID NO: 115 virus 1 118 5.00E−45 DNAmethylase Vibrio cholerae NP_231404.1 nt. 44–439 of HsdM, putative SEQID NO: 118 122 2.00E−18 unknown Pasteurella NP_245314.1 nt. 865–1302 ofmultocida SEQ ID NO: 122 123 4.00E−99 RNA Proteus mirabilis P50509 nt.351–782 of POLYMERASE SEQ ID NO: 123 SIGMA-32 FACTOR 124 9.00E−58ACETOLACTATE Spirulina platensis P27868 nt. 603–1025 of SYNTHASE SEQ IDNO: 124 (ACETOHYDROXY- ACID SYNTHASE) (ALS) 130 0 restriction NeisseriaCAA09003.1 nt. 495–1559 of modification meningitidis SEQ ID NO: 130system-R protein 131 6.00E−91 uronate isomerase Salmonella NP_457532.1compliment of (glucuronate enterica subsp. nt. 661–1380 of isomerase)enterica serovar SEQ ID NO: 131 Typhi 133 3.00E−30 GyrA PasteurellaNP_245778.1 compliment of multocida nt. 1447–1626 of SEQ ID NO: 133 1331.00E−27 DNA GYRASE Pectobacterium P41513 compliment of SUBUNIT Acarotovorum nt. 1302–1442 of SEQ ID NO: 133 138 7.00E−06 KicAPasteurella NP_245545.1 compliment of multocida nt. 92–157 of SEQ ID NO:138 138  1.00E−148 TYPE II Haemophilus O30869 compliment of RESTRICTIONaegyptius nt. 164–1045 of ENZYME HAEII SEQ ID NO: 138 (ENDONUCLEASEHAEII) (R. HAEII) 143 4.00E−06 Gifsy-1 prophage Salmonella NP_461555.1compliment of protein typhimurium LT2 nt. 228–632 of SEQ ID NO: 143 1431.00E−14 hypothetical protein Bacteriophage NP_050531.1 compliment ofVT2-Sa nt. 778–1248 of SEQ ID NO: 143 143 5.00E−09 hypothetical proteinSalmonella CAD09979.1 compliment of enterica subsp. nt. 715–1026 ofenterica serovar SEQ ID NO: 143 Typhi 143 6.00E−10 hypothetical 14.9 kdEscherichia coli NP_065324.1 nt. 3–173 of protein SEQ ID NO: 143 1471.00E−38 GTP-binding Escherichia coli NP_289127.1 compliment ofelongation factor, O157: H7 EDL933 nt. 172–342 of may be inner SEQ IDNO: 147 membrane protein 147 2.00E−14 GTP-binding Borrelia NP_212222.1compliment of membrane protein burgdorferi nt. 17–181 of (lepA) SEQ IDNO: 147 148 6.00E−17 galactokinase Homo sapiens AAC35849.1 compliment ofnt. 746–1246 of SEQ ID NO: 148 148 7.00E−96 GALACTOKINASE ActinobacillusP94169 compliment of (GALACTOSE pleuropneumoniae nt. 232–741 of KINASE)SEQ ID NO: 148 149 1.00E−92 GTP-binding Buchnera sp. NP_240245.1compliment of protein TypA/BipA APS nt. 265–1077 of SEQ ID NO: 149 15 2.00E−21 ORF 1 Escherichia coli CAA39631.1 nt: 665–850 of SEQ ID NO: 15150 6.00E−17 unknown Pasteurella NP_245919.1 nt. 171–665 of multocidaSEQ ID NO: 150 153 7.00E−07 outer membrane Rickettsia conorii T30852 nt.51–623 of protein A SEQ ID NO: 153 155 7.00E−40 cytochrome d Vibriocholerae NP_233259.1 nt. 583–1002 of ubiquinol oxidase, SEQ ID NO: 155subunit II 157 7.00E−13 unknown Pasteurella NP_245490.1 compliment ofmultocida nt. 1170–1367 of SEQ ID NO: 157 157 2.00E−05 glycosylNeisseria AAA68012.1 nt. 85–189 of transferase gonorrhoeae SEQ ID NO:157 158  1.00E−152 MltC Pasteurella NP_246259.1 compliment of multocidant. 36–530 of SEQ ID NO: 158 161 3.00E−25 lipoprotein, putative Vibriocholerae NP_230232.1 nt. 870–1439 of SEQ ID NO: 161 163 9.00E−53chorismate Caulobacter NP_421948.1 nt. 1283–2029 of synthase crescentusSEQ ID NO: 163 168 3.00E−13 COPPER- Mus musculus Q64430 nt. 66–995 ofTRANSPORTING SEQ ID NO: 168 ATPASE 1 (COPPER PUMP 1) 168 2.00E−22 Cutransporting Homo sapiens 2001422A nt. 135–989 of ATPase P SEQ ID NO:168 174 8.00E−48 magnesium/cobalt Mesorhizobium NP_103977.1 nt. 918–1205of transport protein loti SEQ ID NO: 174 175 5.00E−26 vacB proteinBuchnera sp. NP_240369.1 compliment of APS nt. 1–1587 of SEQ ID NO: 175176 3.00E−21 putative ABC Campylobacter NP_282774.1 compliment oftransport system jejuni nt. 259–1089 of permease protein [ SEQ ID NO:176 183 5.00E−29 PROBABLE ATP Ralstonia NP_521442.1 compliment ofSYNTHASE A solanacearum nt. 42–677 of CHAIN SEQ ID NO: 183 TRANSMEMBRANEPROTEIN 185 6.00E−85 putative exported Salmonella NP_458655.1 complimentof protein enterica subsp. nt. 162–1529 of enterica serovar SEQ ID NO:185 Typhi 187 2.00E−05 transketolase Homo sapiens AAA61222.1 nt. 709–819of SEQ ID NO: 187 188  1.00E−116 ribonuclease E Xylella fastidiosaNP_299884.1 compliment of 9a5c nt. 280–1704 of SEQ ID NO: 188 1921.00E−38 ImpA Pasteurella NP_245829.1 nt. 35–448 of multocida SEQ ID NO:192 193 3.00E−08 Orf80 Enterobacteria NP_052285.1 nt. 1612–1818 of phage186 SEQ ID NO: 193 193 1.00E−06 holin Haemophilus AAC45168.1 nt. 370–576of somnus SEQ ID NO: 193 193 0.007 unknown Enterobacteria NP_052260.1nt. 1376–1609 of phage 186 SEQ ID NO: 193 193 2.00E−48 lysozymeHaemophilus AAC45169.1 nt. 608–1093 of somnus SEQ ID NO: 193 1994.00E−21 unknown protein Escherichia coli NP_288675.1 nt. 398–778 ofO157: H7 SEQ ID NO: 199 EDL933, prophage CP- 933V 199 2.00E−49hypothetical protein Bacteriophage NP_049495.1 compliment of 933W nt.1907–2392 of SEQ ID NO: 199 20  1.00E−62 RpL14 Pasteurella NP_246344.1compliment of multocida nt. 233–601 of SEQ ID NO: 20 200 2.00E−62hypothetical protein Salmonella NP_458658.1 compliment of entericasubsp. nt. 431–997 of enterica serovar SEQ ID NO: 200 Typhi 200 3.00E−16hypothetical protein Salmonella NP_458657.1 compliment of entericasubsp. nt. 1028–1264 of enterica serovar SEQ ID NO: 200 Typhi 2012.00E−26 TsaA Pasteurella NP_245732.1 compliment of multocida nt.1618–1809 of SEQ ID NO: 201 209 6.00E−16 TsaA Pasteurella NP_245732.1compliment of multocida nt. 2–136 of SEQ ID NO: 209 211 2.00E−15 unknownPasteurella NP_245535.1 compliment of multocida nt. 23–211 of SEQ ID NO:211 211 1.00E−70 PUTATIVE Ralstonia NP_520082.1 compliment of ATPASEPROTEIN solanacearum nt. 475–915 of SEQ ID NO: 211 212 3.00E−18hypothetical protein Escherichia coli NP_309775.1 compliment of O157: H7nt. 895–1035 of SEQ ID NO: 212 216  1.00E−173 unknown PasteurellaNP_245069.1 nt. 35–1543 of multocida SEQ ID NO: 216 217 9.00E−18diacylglycerol Vibrio cholerae NP_233101.1 nt. 2083–2208 of kinase SEQID NO: 217 221 4.00E−34 Tail-Specific Chlamydia NP_219953.1 nt. 849–1421of Protease trachomatis SEQ ID NO: 221 222 4.00E−23 AGR_C_3689pAgrobacterium NP_355005.1 compliment of tumefaciens str. nt. 940–1305 ofC58 (Cereon) SEQ ID NO: 222 224 9.00E−19 unknown Pasteurella NP_245536.1nt. 15–308 of multocida SEQ ID NO: 224 225 1.00E−89 portal vector-likeSalmonella NP_461651.1 nt. 31–750 of of protein, in phage typhimuriumSEQ ID NO: 225 P2 [Salmonella LT2Fels-2 typhimurium LT2] prophage 2292.00E−25 anaerobic Salmonella CAB62266.1 nt. 1806–2108 of ribonucleotidetyphimurium SEQ ID NO: 229 reductase 234 3.00E−08 conserved Xylellafastidiosa NP_299850.1 nt. 1680–2048 of hypothetical protein 9a5c SEQ IDNO: 234 234 1.00E−42 Methionine Clostridium NP_348177.1 compliment ofsulfoxide reductase acetobutylicum nt. 415–654 of C-terminal domain SEQID NO: 234 related protein, YPPQ ortholog 235 4.00E−16 phage-relatedtail Wolbachia AAK85310.1 compliment of protein endosymbiont of nt.931–1929 of Drosophila SEQ ID NO: 235 melanogaster 235 6.00E−56 similarto orfG Salmonella NP_461625.1 compliment of protein in phagetyphimurium LT2, nt. 313–1863 of 186, Salmonella Fels-2 prophage SEQ IDNO: 235 typhimurium LT2 236 6.00E−20 conserved Pseudomonas NP_252693.1nt. 1572–1916 hypothetical protein aeruginosa of SEQ ID NO: 236 2405.00E−27 MODIFICATION Brevibacterium P10283 compliment of METHYLASE BEPIepidermidis nt. 922–1305 of SEQ ID NO: 240 241 2.00E−15 phage-relatedXylella fastidiosa NP_299573.1 compliment of protein 9a5c nt. 865–1305of SEQ ID NO: 241 241 4.00E−08 hypothetical protein phage SPP1 T42296nt. 73–636 of SEQ ID NO: 241 241 4.00E−07 hypothetical proteinSalmonella NP_458686.1 nt. 10–468 of enterica subsp. SEQ ID NO: 241enterica serovar Typhi 242 2.00E−29 translation chloroplast - S35701compliment of elongation factor soybean nt. 18–1085 of EF-G SEQ ID NO:242 247 3.00E−23 GTP Synechococcus Q54769 compliment of CYCLOHYDROLA sp.PCC 7942 nt. 1009–1257c SE I (GTP-CH-I) of SEQ ID NO: 247 248 6.00E−05phospho-N- Aquifex aeolicus NP_213025.1 nt. 830–1747 of acetylmuramoyl-SEQ ID NO: 248 pentapeptide- transferase 25  2.00E−86 PROBABLE RalstoniaNP_522358.1 compliment of TRANSPORT solanacearum nt. 309–854 ofTRANSMEMBRANE SEQ ID NO: 25 PROTEIN 25  7.00E−06 major facilitatorCaulobacter NP_419155.1 compliment of family transporter crescentus nt.134–283 of SEQ ID NO: 25 250  1.00E−150 CpdB Pasteurella NP_246953.1compliment of multocida nt. 36–1016 of SEQ ID NO: 250 252 3.00E−57alanyl-tRNA Vibrio cholerae AAA99922.1 compliment of synthetase nt.1418–1951 of SEQ ID NO: 252 253  1.00E−108 similar to ListeriaNP_464432.1 compliment of glutathione monocytogenes nt. 411–1358 ofReductase EGD-e of SEQ ID NO: 253 259 3.00E−39 hypothetical proteinSalmonella NP_458654.1 compliment of enterica subsp. nt. 342–1037 ofenterica serovar SEQ ID NO: 259 Typhi 259 3.00E−17 possible exportedSalmonella NP_458653.1 compliment of protein enterica subsp. nt.1251–1607 enterica serovar of SEQ ID NO: Typhi 259 261 5.00E−74hypothetical protein Haemophilus S27582 compliment of 6 - Haemophilusinfluenzae nt. 3–422 of influenzae SEQ ID NO: 261 263 1.00E−94 putativeHaemophilus AAD01406.1 nt. 2142–2672 of transposase paragallinarum SEQID NO: 263 264  1.00E−126 unknown Actinobacillus NP_067554.1 nt. 40–714of actinomycetemco- SEQ ID NO: 264 mitans 264  1.00E−103 unknownActinobacillus NP_067555.1 nt. 695–1309 of actinomycetemco- SEQ ID NO:264 mitans 264 2.00E−21 unknown Actinobacillus NP_067556.1 nt. 1302–1448of actinomycetemco- SEQ ID NO: 264 mitans 265 6.00E−27 Aminopeptidase 2chloroplast Q42876 nt. 556–1539 of SEQ ID NO: 265 268  1.00E−116 MutYPasteurella NP_246257.1 nt. 1003–1581 of multocida SEQ ID NO: 268 2721.00E−07 hypothetical protein Bacteriophage NP_049495.1 compliment of933W nt. 77–232 of SEQ ID NO: 272 274 3.00E−13 unknown PasteurellaNP_246952.1 compliment of multocida nt. 1658–1975 of SEQ ID NO: 274 2753.00E−20 CafA Neisseria AAG24267.1 nt. 1299–1571 of gonorrhoeae SEQ IDNO: 275 276 1.00E−45 mukE protein Vibrio cholerae NP_231351.1 complimentof nt. 650–1390 of SEQ ID NO: 276 276 1.00E−69 KicA PasteurellaNP_245545.1 compliment of multocida nt. 647–1321 of SEQ ID NO: 276 2782.00E−56 3-oxoacyl-[acyl- Salmonella NP_455686.1 nt. 1366–1944 ofcarrier-protein] enterica subsp. SEQ ID NO: 278 synthase III entericaserovar Typhi 281 5.00E−56 unknown Pasteurella NP_246261.1 compliment ofmultocida nt. 31–678 of SEQ ID NO: 281 282 3.00E−09 orf25; similar to Tbacteriophage phi NP_490625.1 compliment of gene of P2 CTX nt. 511–1032of SEQ ID NO: 282 282 1.00E−08 orf11; similar to Haemophilus AAC45165.1compliment of phage P2 gene S- somnus nt. 1450–1584 of like product,which SEQ ID NO: 282 is involved in tail synthesis, 282 9.00E−27putative Salmonella NP_457167.1 compliment of bacteriophage tailenterica subsp. nt. 3–509 of protein enterica serovar SEQ ID NO: 282Typhi 286 5.00E−18 plasmid-related Listeria innocua NP_471066.1compliment of protein plasmid nt. 887–1501 of SEQ ID NO: 286 2878.00E−20 GTP Escherichia coli NP_287920.1 nt. 2–145 of cyclohydrolase IIO157: H7 EDL933 SEQ ID NO: 287 289  1.00E−168 MODIFICATION HaemophilusO30868 compliment of METHYLASE aegyptius nt. 138–1091 of HAEII SEQ IDNO: 289 289 5.00E−11 TYPE II Haemophilus O30869 compliment ofRESTRICTION aegyptius nt. 22–132 of ENZYME HAEII SEQ ID NO: 289 2896.00E−47 mukF homolog Haemophilus AAB70828.1 compliment of influenzaebiotype nt. 1107–1385 aegyptius of SEQ ID NO: 289 294  1.00E−171 LICAPROTEIN Haemophilus P14181 compliment of influenzae nt. 677–1564 ofRM7004 SEQ ID NO: 294 297  1.00E−158 DNA methylase Vibrio choleraeNP_231404.1 compliment of HsdM, putative nt. 12–1136 of SEQ ID NO: 297302 0 HEME-BINDING Haemophilus P33950 nt. 3–1316 of PROTEIN A influenzaeDL42 SEQ ID NO: 302 304 6.00E−19 hypothetical protein Haemophilus S27582nt. 121–267 of 6 influenzae SEQ ID NO: 304 305 6.00E−40 putativeStreptococcus NP_269557.1 nt. 65–805 of recombinase - pyogenes M1 SEQ IDNO: 305 phage associated GAS 305 7.00E−22 single stranded Shewanella sp.AAB57886.1 nt. 1607–2014 of DNA-binding F1A SEQ ID NO: 305 protein 3051.00E−43 phage-related Bacillus NP_244410.1 nt. 92–751 of proteinhalodurans SEQ ID NO: 305 312 1.00E−28 PUTATIVE Ralstonia NP_518994.1nt. 1819–2673 of BACTERIOPHAGE- solanacearum SEQ ID NO: 312 RELATEDTRANSMEMBRANE PROTEIN 312 9.00E−25 similar to Homo sapiens XP_068727.1nt. 27–1001 of BASEMENT SEQ ID NO: 312 MEMBRANE- SPECIFIC HEPARANSULFATE PROTEOGLYCAN CORE PROTEIN PRECURSOR (HSPG) 315 2.00E−45 uracilpermease Deinococcus NP_296001.1 compliment of radiodurans nt. 525–1592of SEQ ID NO: 315 318 7.00E−15 CzcD Pasteurella NP_246276.1 complimentof multocida nt. 3–227 of SEQ ID NO: 318 320 2.00E−60 orf3; similar toHaemophilus AAC45159.1 compliment of endonuclease somnus nt. 606–1241 ofsubunit of the SEQ ID NO: 320 phage P2 terminase (gene M) 320 2.00E−09orf4; similar to Haemophilus AAC45160.1 compliment of head somnus nt.52–285 of completion/stabili- SEQ ID NO: 320 zation protein (gene L) ofphage P2 320 3.00E−35 orf2; similar to Haemophilus AAC45158.1 complimentof major capsid somnus nt. 1271–1624 of protein precursor of SEQ ID NO:320 phage P2 (gene N) 323 4.00E−37 dedC protein Escherichia coliAAA23966.1 compliment of nt. 74–463 of SEQ ID NO: 323 324  1.00E−153conserved Neisseria NP_274972.1 compliment of hypothetical proteinmeningitidis nt. 930–1943 of MC58 SEQ ID NO: 324 326 5.00E−52selenophosphate Eubacterium CAB53511.1 compliment of synthetaseacidaminophilum nt. 1186–2292 of SEQ ID NO: 326 328  1.00E−129 secretionprotein Pseudomonas NP_252510.1 compliment of SecD aeruginosa nt. 8–625of SEQ ID NO: 328 333 3.00E−08 unknown Pasteurella NP_245489.1compliment of multocida nt. 5–418 of SEQ ID NO: 333 336 6.00E−38probable methyl Pseudomonas NP_253353.1 compliment of transferaseaeruginosa nt. 2547–2819 of SEQ ID NO: 336 338 2.00E−98 Pmi PasteurellaNP_245766.1 nt. 144–842 of multocida SEQ ID NO: 338 339 2.00E−07 tRNAEscherichia coli QQECPE nt. 2331–2540 of nucleotidyltransferase SEQ IDNO: 339 340 0 DNA gyrase, Salmonella NP_461214.1 compliment of subunitA, type II typhimurium LT2 nt. 93–1799 of topoisomerase SEQ ID NO: 340342 4.00E−12 tolA protein Haemophilus JC5212 nt. 980–1318 of influenzaeSEQ ID NO: 342 344 1.00E−07 MODIFICATION Haemophilus P50192 complimentof METHYLASE parahaemolyticus nt. 849–1034 of HPHIA SEQ ID NO: 344 3448.00E−05 ABC transporter Leishmania major AAF31030.1 compliment ofprotein 1 nt. 17–205 of SEQ ID NO: 344 349 3.00E−44 conserved NeisseriaNP_273467.1 compliment of hypothetical protein meningitidis nt.1397–1903 of MC58 SEQ ID NO: 349 349 8.00E−09 hypothetical proteinPseudomonas NP_252667.1 compliment of aeruginosa nt. 795–1121 of SEQ IDNO: 349 349 9.00E−10 conserved Helicobacter NP_207009.1 compliment ofhypothetical pylori 26695 nt. 1319–1816 of secreted protein SEQ ID NO:349 349 2.00E−06 putative TPR Salmonella NP_463149.1 compliment ofrepeat protein typhimurium LT2 nt. 2244–2558 of SEQ ID NO: 349 35 1.00E−23 type I restriction- Xylella fastidiosa NP_300003.1 complimentof modification 9a5c nt. 29–388 of system specificity SEQ ID NO: 35determinant 352  1.00E−116 putative peptidase Escherichia coliNP_416827.1 compliment of K12 nt. 951–1640 of SEQ ID NO: 352 352 0unknown Pasteurella NP_245275.1 compliment of multocida nt. 86–946 ofSEQ ID NO: 352 354 5.00E−86 putative uronate Salmonella NP_462052.1compliment of isomerase typhimurium LT2 nt. 168–914 of SEQ ID NO: 354356 1.00E−07 isomerase-like Escherichia coli S57220 nt. 5–73 of protein(DsbD) - SEQ ID NO: 356 358 1.00E−07 USG protein Pediococcus CAC16793.1nt. 534–1307 of pentosaceus SEQ ID NO: 358 358 0.005 HsdS proteinEscherichia coli CAA10700.1 nt. 26–205 of SEQ ID NO: 358 361  1.00E−152maltodextrin Escherichia coli NP_289957.1 compliment of phosphorylaseO157:H7 EDL933 nt. 77–922 of SEQ ID NO: 361 363 6.00E−06 BH2505-unknownBacillus NP_243371.1 nt. 554–844 of conserved protein halodurans SEQ IDNO: 363 368 1.00E−12 H02F09.3.p Caenorhabditis NP_508295.1 compliment ofelegans nt. 1069–1977 of SEQ ID NO: 368 368 6.00E−27 hypotheticalMesorhizobium NP_102360.1 compliment of glycine-rich protein loti nt.1201–1986 of SEQ ID NO: 368 37  6.00E−09 putative ATP- Escherichia coliNP_415469.1 compliment of binding component K12 nt. 455–691 of of atransport SEQ ID NO: 37 system 372 7.00E−18 conserved ClostridiumBAB80319.1 compliment of hypothetical protein perfringens nt. 1763–1924of SEQ ID NO: 372 376 7.00E−24 putative Salmonella NP_456379.1compliment of bacteriophage enterica subsp. nt. 158–808 of proteinenterica serovar SEQ ID NO: 376 Typhi 376 8.00E−10 hypothetical proteinXylella fastidiosa NP_298882.1 compliment of 9a5c nt. 1129–1671 of SEQID NO: 376 376 9.00E−06 Iin1713 Listeria innocua NP_471049.1 complimentof nt. 913–1557 of SEQ ID NO: 376 377 6.00E−05 Vng1732c Halobacteriumsp. NP_260487.1 nt. 2378–2587 of NRC-1 SEQ ID NO: 377 377 1.00E−11INVASIN Yersinia P31489 compliment of PRECURSOR enterocolitica nt.127–345 of (OUTER SEQ ID NO: 377 MEMBRANE ADHESIN) 382 4.00E−16 unknownPasteurella NP_246871.1 compliment of multocida nt. 967–1068 of SEQ IDNO: 382 383 4.00E−36 putative Streptomyces BAB69302.1 nt. 488–1162 oftransposase avermitilis SEQ ID NO: 383 383 3.00E−58 recombinase IncNplasmid R46 NP_511241.1 compliment of nt. 1–393 of SEQ ID NO: 383 3834.00E−24 transposase Escherichia coli I69674 nt. 1294–1740 of SEQ ID NO:383 383 0 tnpA Yersinia CAA73750.1 nt. 1782–2834 of enterocolitica SEQID NO: 383 385 2.00E−31 unknown Pasteurella NP_246065.1 nt. 1515–1772 ofmultocida SEQ ID NO: 385 386 5.00E−65 cydC [ Escherichia coli AAA66172.1compliment of nt. 3438–4115 of SEQ ID NO: 386 386 4.00E−33 ABCtransporter, Mesorhizobium NP_105463.1 compliment of ATP-binding lotint. 2569–3390 of protein SEQ ID NO: 386 388 3.00E−45 60 KDA INNER-Coxiella burnetii P45650 compliment of MEMBRANE nt. 3211–3759 PROTEIN ofSEQ ID NO: HOMOLOG 388 390 4.00E−25 putative DNA- Salmonella NP_458175.1nt. 1051–1416 of binding protein enterica subsp. SEQ ID NO: 390 entericaserovar Typhi 390 3.00E−13 transcriptional Bacillus NP_241773.1compliment of regulator halodurans nt. 84–578 of SEQ ID NO: 390 3903.00E−06 DNA translocase Staphylocoecus NP_372265.1 compliment of stageIII sporulation aureus subsp. nt. 620–871 of prot homolog aureus Mu50SEQ ID NO: 390 395 7.00E−31 ATPase, Cu++ Homo sapiens NP_000044.1compliment of transporting, beta nt. 615–1406 of polypeptide SEQ ID NO:395 397 3.00E−23 terminase large Bacteriophage NP_112076.1 compliment ofsubunit HK620 nt. 2363–2725 of SEQ ID NO: 397 397 3.00E−16 hypotheticalprotein Xylella fastidiosa NP_297824.1 compliment of 9a5c nt. 1517–1744of SEQ ID NO: 397 398 4.00E−67 orf32 Haemophiius NP_536839.1 complimentof phage HP2 nt. 1288–1866 of SEQ ID NO: 398 398 8.00E−24 putativeSalmonella NP_463063.1 compliment of cytoplasmic protein typhimurium LT2nt. 798–1220 of SEQ ID NO: 398 398 2.00E−83 orf31 HaemophilusNP_043502.1 compliment of phage HP1 nt. 1881-2510 of SEQ ID NO: 398 3991.00E−94 HEME/HEMOPEXIN- Haemophilus P45355 nt. 88–774 of BINDINGinfluenzae N182 SEQ ID NO: 399 PROTEIN 401 3.00E−63 Sty SBLI SalmonellaCAA68058.1 nt. 1690–2742 of enterica SEQ ID NO: 401 401 3.00E−06RESTRICTION- Mycoplasma NP_325912.1 nt. 79–489 of MODIFICATION pulmonisSEQ ID NO: 401 ENZYME SUBUNIT M3 402 2.00E−13 OPACITY Neisseria Q05033compliment of PROTEIN OPA66 gonorrhoeae nt. 2634–2915 of PRECURSOR SEQID NO: 402 406 8.00E−13 type I restriction Neisseria NP_273876.1 nt.281–520 of enzyme EcoR124IIR meningitidis SEQ ID NO: 406 MC58 4076.00E−65 unknown Pasteurella NP_246237.1 nt. 938–2450 of multocida SEQID NO: 407 407 5.00E−99 PepE Pasteurella NP_245391.1 nt. 1216–1917 ofmultocida SEQ ID NO: 407 407 1.00E−16 Hemoglobin- Haemophilus Q48153 nt.1–141 of haptoglobin binding influenzae Tn106 SEQ ID NO: 407 protein A409  1.00E−106 hypothetical protein Haemophilus S27577 compliment of 1influenzae nt. 2524–3159 of SEQ ID NO: 409 411 4.00E−29 heme-repressibleHaemophilus AAB46794.1 nt. 391–615 of hemoglobin-binding influenzae,type b, SEQ ID NO: 411 protein strain HI689 411 0 Hemoglobin-Haemophilus Q48153 nt. 651–3263 of haptoglobin binding influenzae Tn106SEQ ID NO: 411 protein A 412 2.00E−07 REGULATORY bacteriophage P03036compliment of PROTEIN CRO 434 nt. 59–259 of (ANTIREPRESSOR) SEQ ID NO:412 412 4.00E−06 hypothetical protein Bacteriophage CAC83535.1 nt.1436–1654 of P27 SEQ ID NO: 412 413 8.00E−07 hypothetical proteinDeinococcus NP_294301.1 compliment of radiodurans nt. 791–1012 of SEQ IDNO: 413 414 9.00E−65 conserved Vibrio cholerae NP_230092.1 nt. 1696–2103of hypothetical protein SEQ ID NO: 414 414 3.00E−93 unknown PasteurellaNP_246834.1 nt. 1777–2109 of multocida SEQ ID NO: 414 416 2.00E−17unknown Pasteurella NP_246629.1 compliment of multocida nt. 2565–2831 ofSEQ ID NO: 416 416 4.00E−26 hypothetical protein Escherichia coli S30728compliment of o154 nt. 1928–2254 of SEQ ID NO: 416 416 3.00E−37transport protein Pseudomonas NP_253757.1 compliment of TatC aeruginosant. 1494–2018 of of SEQ ID NO: 416 417 1.00E−66 weakly similar toListeria innocua NP_471073.1 compliment of methyltransferases nt.999–1928 of SEQ ID NO: 417 417 5.00E−05 DNA-BINDING PectobacteriumQ47587 compliment of PROTEIN RDGA carotovorum nt. 3526–4212 of SEQ IDNO: 417 417 2.00E−29 putative phage- Yersinia pestis NP_407132.1compliment of related protein nt. 2546–2938 of SEQ ID NO: 417 4173.00E−06 Adenine-specific Thermoplasma NP_393798.1 compliment of DNAmethylase acidophilum nt. 826–1020 of SEQ ID NO: 417 43  9.00E−16 PcnBPasteurella NP_245801.1 nt. 511–870 of multocida SEQ ID NO: 43 4342.00E−97 beta′ subunit of Nephroselmis NP_050840.1 compliment of RNApolymerase olivacea nt. 32–1534 of SEQ ID NO: 434 435 4.00E−52MODIFICATION Brevibacterium P10283 compliment of METHYLASE BEPIepidermidis nt. 11–565 of SEQ ID NO: 435 435 4.00E−57 pentafunctionalSaccharomyces NP_010412.1 compliment of arom polypeptide cerevisiae nt.757–2064 of (contains: 3- SEQ ID NO: 435 dehydroquinate synthase, 3-dehydroquinate, dehydratase (3- dehydroquinase), shikimate 5-dehydrogenase, shikimate kinase, and epsp synthase) 437 5.00E−70dihydrofolate Haemophilus S52336 nt. 2393–2767 of reductase influenzaeSEQ ID NO: 437 (clinical isolate R1042) 438  1.00E−106 polyA polymeraseVibrio cholerae NP_230244.1 nt. 3–1124 of SEQ ID NO: 438 439 6.00E−60Porphyrin Salmonella NP_457816.1 nt. 2343–2783 of biosynthetic proteinenterica subsp. SEQ ID NO: 439 enterica serovar Typhi 441 5.00E−73 RimMPasteurella NP_246234.1 compliment of multocida nt. 151–441 of SEQ IDNO: 441 442 9.00E−31 LIPOPROTEIN Salmonella P40827 compliment of NLPDtyphimurium nt. 3362–3520 of SEQ ID NO: 442 444 6.00E−24 glycine betaineStaphylococcus NP_371872.1 compliment of transporter aureus subsp. nt.2242–2514 of aureus Mu50 SEQ ID NO: 444 452 6.00E−28 unknown PasteurellaNP_245616.1 compliment of multocida nt. 533–883 of SEQ ID NO: 452 452 0Type I restriction Escherichia coli Q47163 nt. 3291–4154 of enzymeEcoprrl M SEQ ID NO: 452 protein 452 2.00E−75 type I restrictionUreaplasma NP_077929.1 nt. 4156–4662 of enzyme M protein urealyticum SEQID NO: 452 455 9.00E−56 PROBABLE Ralstonia NP_520059.1 nt. 2028–2774 ofBACTERIOPHAGE solanacearum SEQ ID NO: 455 PROTEIN 455 2.00E−55 orf2;similar to Haemophilus AAC45158.1 nt. 2864–3490 of major capsid somnusSEQ ID NO: 455 protein precursor of phage P2 (gene N), 455  1.00E−175gpP Enterobacteria NP_046758.1 compliment of phage P2 nt. 127–1812 ofSEQ ID NO: 455 456 1.00E−38 hypothetical protein Pseudomonas NP_542872.1compliment of putida nt. 1010–1282 of SEQ ID NO: 456 456  1.00E−172hypothetical protein Pseudomonas NP_542873.1 compliment of putida nt.1443–2006 of SEQ ID NO: 546 457  1.00E−116 hypothetical proteinHaemophilus S15287 compliment of (galE 5′ region) - influenzae nt.62–961 of Haemophilus SEQ ID NO: 457 influenzae 457  1.00E−134dTDPglucose 4,6- Actinobacillus T00102 nt. 2637–3656 of dehydrataseactinomycetemco- SEQ ID NO: 457 mitans 459 2.00E−10 RNA polymeraseSynechocystis sp. NP_441586.1 nt. 25–117 of gamma-subunit PCC 6803 SEQID NO: 459 461 9.00E−51 conserved Staphylococcus NP_370593.1 nt.4124–4624 of hypothetical protein aureus subsp. SEQ ID NO: 461 aureusMu50 462 9.00E−06 NADH Burkholderia AAG01016.1 nt. 703–828 ofdehydrogenase pseudomallei SEQ ID NO: 462 465 3.00E−41 GTP-bindingSynechocystis sp. NP_441951.1 compliment of protein Era PCC 6803 nt.2470–2787 of SEQ ID NO: 465 466 1.00E−15 putative Salmonella NP_455548.1nt. 837–1478 of bacteriophage enterica subsp. SEQ ID NO: 466 proteinenterica serovar Typhi 466 1.00E−90 orf31 Haemophilus NP_043502.1 nt.2396–3199 of phage HP1 SEQ ID NO: 466 469 0 Hemoglobin and HaemophilusQ9X442 compliment of hemoglobin- influenzae HI689 nt. 427–3459 ofhaptoglobin binding SEQ ID NO: 469 protein C precursor 471 8.00E−05transposase, Neisseria NP_274608.1 nt. 2957–3217 of putativemeningitidis SEQ ID NO: 471 MC58 472 6.00E−08 hypothetical proteinSalmonella NP_458660.1 compliment of enterica subsp. nt. 2881–3270 ofenterica serovar SEQ ID NO: 472 Typhi 472 5.00E−23 antirestrictionMesorhizobium NP_106707.1 nt. 4908–5324 of protein loti SEQ ID NO: 472472 1.00E−75 hypothetical protein Salmonella NP_458661.1 compliment ofenterica subsp. nt. 1931–2776 of enterica serovar SEQ ID NO: 472 Typhi472 9.00E−72 hypothetical protein Salmonella NP_458662.1 compliment ofenterica subsp. nt. 544–1689 of enterica serovar SEQ ID NO: 472 Typhi475 3.00E−25 unknown Pasteurella NP_244952.1 nt. 3207–3626 of multocidaSEQ ID NO: 475 476 8.00E−73 putative DNA- Salmonella NP_458175.1compliment of binding protein enterica subsp. nt. 3339–4310 of entericaserovar SEQ ID NO: 476 Typhi 476 6.00E−47 anticodon nuclease NeisseriaNP_273873.1| compliment of meningitidis nt. 4397–4885 of MC58 SEQ ID NO:476 478 3.00E−06 methionin Arabidopsis CAB38313.1 compliment ofsynthase-like thaliana nt. 3554–3679 of enzyme SEQ ID NO: 478 4783.00E−05 unknown Pasteurella NP_245444.1 compliment of multocida nt.164–250 of SEQ ID NO: 478 479 1.00E−18 conserved Xylella fastidiosaNP_298841.1 nt. 2302–2658 of hypothetical protein 9a5c SEQ ID NO: 47948  3.00E−19 Dca Neisseria AAF12796.1 compliment of gonorrhoeae nt.225–746 of SEQ ID NO: 48 482 1.00E−06 hypothetical protein NeisseriaNP_275122.1 nt. 2055–2189 of meningitidis SEQ ID NO: 482 MC58 4829.00E−28 conserved Neisseria NP_274383.1 nt. 1689–1898 of hypotheticalprotein meningitidis SEQ ID NO: 482 MC58 487 5.00E−75 conservedNeisseria NP_284304.1 nt. 2541–2978 of hypothetical protein meningitidisSEQ ID NO: 487 Z2491 488 2.00E−64 unknown Pasteurella NP_246617.1 nt.2983–3540 of multocida SEQ ID NO: 488 488 8.00E−93 1-deoxy-D-xyluloseZymomonas AAD29659.1 nt. 1344–1880 of 5-phosphate mobilis SEQ ID NO: 488reductoisomerase 491 5.00E−51 rubredoxin Clostridium AAB50346.1compliment of oxidoreductase acetobutylicum nt. 1690–2439 of homolog SEQID NO: 491 492 1.00E−27 Phosphotransferase Staphylococcus AAK83253.1compliment of system enzyme aureus nt. 755–970 of IIA-like protein SEQID NO: 492 493 2.00E−84 unknown Actinobacillus AAC70895.1 nt. 3333–3935of actinomycetemco- SEQ ID NO: 493 mitans 493 4.00E−49 unknownHelicobacter NP_223898.1 nt. 3345–4010 of pylori J99 SEQ ID NO: 493 4939.00E−31 transcriptional Acinetobacter AAF20290.1 nt. 1885–2793 offactor MdcH calcoaceticus SEQ ID NO: 493 493 6.00E−30 HimA PasteurellaNP_245565.1 nt. 1129–1260 of multocida SEQ ID NO: 493 494 4.00E−85putative prophage Yersinia pestis NP_404712.1 nt. 900–2099 of integraseSEQ ID NO: 494 494 4.00E−63 DNA Xylella fastidiosa NP_299063.1compliment of methyltransferase 9a5c nt. 5544–6170 of SEQ ID NO: 494 4946.00E−19 MODIFICATION Lactococcus lactis P34877 compliment of METHYLASEsubsp. cremoris nt. 5019–6113 of SCRFIA SEQ ID NO: 494 497 0transferrin-binding Haemophilus S70906 nt. 3251–4999 of protein 1influenzae (strain SEQ ID NO: 497 PAK 12085) 50  5.00E−07 AcpPPasteurella NP_246856.1 nt. 2–136 of multocida SEQ ID NO: 50 5017.00E−50 conserved Vibrio cholerae NP_231403.1 compliment ofhypothetical protein nt. 3649–4872 of SEQ ID NO: 501 501 0 type Irestriction Vibrio cholerae NP_231400.1 compliment of enzyme HsdR, nt.1551–3440 of putative SEQ ID NO: 501 501 4.00E−13 ATP-dependentDeinococcus NP_295921.1 compliment of DNA helicase radiodurans nt.5317–5844 of RecG-related SEQ ID NO: 501 protein 501 5.00E−11 conservedUreaplasma NP_077868.1 compliment of hypothetical urealyticum nt.5098–5769 of SEQ ID NO: 501 504 2.00E−44 OUTER Haemophilus Q48218compliment of MEMBRANE influenzae nt. 4681–5019 of PROTEIN P2 AG30010SEQ ID NO: 504 PRECURSOR (OMP P2) 507 0 SpoT Pasteurella NP_245857.1compliment of multocida nt. 3685–5316 of SEQ ID NO: 507 51  6.00E−87glucosamine- Vibrio cholerae NP_230141.1 nt. 30–470 of fructose-6- SEQID NO: 51 phosphate aminotransferase (isomerizing) 512 2.00E−28dipeptide transport Yersinia pestis NP_407439.1 compliment of systempermease nt. 1095–1580 of protein SEQ ID NO: 512 512 3.00E−82 SapCPasteurella NP_245850.1 compliment of multocida nt. 730–1095 of SEQ IDNO: 512 514 9.00E−06 putative integral Campylobacter NP_281236.1compliment of membrane protein jejuni nt. 577–684 of SEQ ID NO: 514 5143.00E−11 orf, hypothetical Escherichia coli NP_286004.1 compliment ofprotein O157: H7 EDL933 nt. 449–568 of SEQ ID NO: 514 518 0 putativeinner Neisseria NP_284893.1 nt. 92–1927 of membrane trans- meningitidisSEQ ID NO: 518 acylase protein Z2491 519 4.00E−30 hypothetical proteinMesorhizobium NP_108196.1 compliment of loti nt. 2221–3159 of SEQ ID NO:519 519 2.00E−12 conserved Listeria innocua NP_471067.1 compliment ofhypothetical protein nt. 3994–5241 of SEQ ID NO: 519 519 6.00E−20hypothetical protein Mesorhizobium NP_108198.1 compliment of loti nt.707–1552 of SEQ ID NO: 519 519 4.00E−26 putative Salmonella NP_455526.1compliment of bacteriophage enterica subsp. nt. 3982–5163 of proteinenterica serovar SEQ ID NO: 519 Typhi 52  3.00E−94 OUTER HaemophilusQ48218 nt. 45–788 of MEMBRANE influenzae SEQ ID NO: 52 PROTEIN P2PRECURSOR (OMP P2) 520 0 excision nuclease Escherichia coli NP_418482.1compliment of subunit A K12 nt. 6309–7745 of SEQ ID NO: 520 521 5.00E−08zinc/manganese Rickettsia conorii NP_359651.1 nt. 2236–2652 of ABCtransporter SEQ ID NO: 521 substrate binding protein 521  1.00E−140unknown Pasteurella NP_245865.1| nt. 338–1390 of multocida SEQ ID NO:521 521 1.00E−86 ORF_f432 Escherichia coli AAB40463.1 nt. 203–1390 ofSEQ ID NO: 521 522 3.00E−22 unknown Pasteurella NP_246093.1 nt. 670–885of multocida SEQ ID NO: 522 526 5.00E−33 exodeoxyribonuclease Yersiniapestis NP_404635.1 nt. 5582–6202 of V alpha chain SEQ ID NO: 526 5261.00E−62 exodeoxyribonuclease Vibrio cholerae NP_231950.1 nt. 5675–6193of V, 67 kDa SEQ ID NO: 526 subunit 527  1.00E−147 unknown PasteurellaNP_245980.1 nt. 4283–5203 of multocida SEQ ID NO: 527 527 0 MfdPasteurella NP_245978.1 nt. 7545–8759 of multocida SEQ ID NO: 527 527 0transcription-repair Salmonella NP_455708.1 nt. 7611–8762 of couplingfactor enterica subsp. SEQ ID NO: 527 (TrcF) enterica serovar Typhi 5270 PROBABLE Ralstonia NP_519763.1 nt. 7611–8870 of TRANSCRIPTION-solanacearum SEQ ID NO: 527 REPAIR COUPLING FACTOR PROTEIN 528 1.00E−48undecaprenyl Chlamydia NP_297109.1 nt. 2918–3712 of pyrophosphatemuridarum SEQ ID NO: 528 synthetase 528 0 leucyl-tRNA Vibrio choleraeNP_230603.1 compliment of synthetase nt. 180–2822 of SEQ ID NO: 528 529 1.00E−104 DNA PRIMASE Legionella P71481 compliment of pneumophila nt.3316–3960 of SEQ ID NO: 529 534 9.00E−29 putative integrase SalmonellaNP_461690.1 nt. 4668–5009 of typhimurium LT2 SEQ ID NO: 534 534 6.00E−18hypothetical protein Neisseria NP_283002.1 compliment of NMA0153meningitidis nt. 5933–6337 of Z2491 SEQ ID NO: 534 534 2.00E−23hypothetical protein Deinococcus NP_294868.1 nt. 6908–7654 ofradiodurans SEQ ID NO: 534 534 1.00E−88 prophage CP4-57 Escherichia coliNP_417111.1 nt. 5057–5875 of integrase K12 SEQ ID NO: 534 535  1.00E−115phosphate Buchnera sp. NP_240007.1 nt. 3385–4596 of acetyltransferaseAPS SEQ ID NO: 535 536 3.00E−35 cobalt membrane ActinobacillusAAD49727.1 compliment of transport protein pleuropneumoniae nt.3531–4136 of CbiQ SEQ ID NO: 536 536 6.00E−37 unknown PasteurellaNP_245305.1 compliment of multocida nt. 6478–6921 of SEQ ID NO: 536 5392.00E−26 Orf122 Chlorobium AAG12204.1 compliment of tepidum nt.1778–2008 of SEQ ID NO: 539 540 1.00E−77 heat shock protein NeisseriaNP_273864.1 compliment of HtpX meningitidis nt. 2567–3481 of MC58 SEQ IDNO: 540 541 0 IleS Pasteurella NP_246601.1 nt. 3167–4549 of multocidaSEQ ID NO: 541 545 2.00E−09 DNA-BINDING Pectobacterium Q47588 nt.3816–3977 of PROTEIN RDGB carotovorum SEQ ID NO: 545 545 2.00E−11putative Sinorhizobium NP_437741.1 compliment of transposase melilotint. 2786–3019 of SEQ ID NO: 544 545 2.00E−07 Hypothetical 42.5Escherichia coli BAA77933.1 compliment of kd protein in thrW- nt.2614–2811 of argF intergenic SEQ ID NO: 545 region 545 4.00E−18 putativeIS element Salmonella NP_454711.1 nt. 1955–2230 of transposase entericasubsp. SEQ ID NO: 545 enterica serovar Typhi 546 0 HEME/HEMOPEXIN-Haemophilus P45354 nt. 5551–7809 of BINDING influenzae SEQ ID NO: 546PROTEIN 546 0 HEME/HEMOPEXIN Haemophilus P45356 nt. 3842–5536 ofUTILIZATION influenzae SEQ ID NO: 546 PROTEIN B 546 0 HEME/HEMOPEXINHaemophilus P45357 nt. 1638–3176 of UTILIZATION influenzae SEQ ID NO:546 PROTEIN C 546 2.00E−12 HasR Pasteurella NP_246561.1 nt. 3149–3763 ofmultocida SEQ ID NO: 546 549 0 unknown Pasteurella NP_246821.1 nt.2526–3512 of multocida SEQ ID NO: 549 549  1.00E−121 putative membraneYersinia pestis NP_404859.1 nt. 605–1108 of protein SEQ ID NO: 549 549 0unknown Pasteurella NP_246822.1 nt. 1122–1664 of multocida SEQ ID NO:549 551  1.00E−157 type I restriction- Xylella fastidiosa NP_300016.1compliment of modification 9a5c nt. 7396–8322 of system SEQ ID NO: 551endonuclease 552  1.00E−100 valyl-tRNA Deinococcus NP_293872.1compliment of synthetase radiodurans nt. 6691–8688 of SEQ ID NO: 552 5520 VALYL-TRNA Haemophilus P36432 compliment of SYNTHETASE parainfluenzaent. 5850–6647 of SEQ ID NO: 552 553 0 DNA-directed RNA Vibrio choleraeNP_229982.1 nt. 2668–6699 of polymerase, beta SEQ ID NO: 553 subunit 5540 iron utilization Haemophilus T10887 nt. 991–2508 of protein Binfluenzae SEQ ID NO: 554 559  1.00E−100 PREPROTEIN Bacillus firmusP96313 nt. 3420–4472 of TRANSLOCASE SEQ ID NO: 559 SECA SUBUNIT 56 2.00E−23 RpL30 Pasteurella NP_246336.1 compliment of multocida nt.656–832 of SEQ ID NO: 56 56  9.00E−13 RpS5 Pasteurella NP_246337.1compliment of multocida nt. 843–1064 of SEQ ID NO: 56 560  1.00E−157Na+/H+ antiporter Vibrio cholerae NP_231535.1 2 compliment of nt.279–2989 of SEQ ID NO: 560 562 1.00E−72 putative biotin Yersinia pestisNP_404419.1 nt. 7862–8878 of sulfoxide reductase SEQ ID NO: 562 2 562 1.00E−125 restriction Neisseria CAA09003.1 nt. 2–985 of modificationmeningitidis SEQ ID NO: 562 system-R protein 563 0 IMMUNOGLOBULINHaemophilus P45384 compliment of A1 PROTEASE influenzae HK715 nt.4127–9508 of SEQ ID NO: 563 563 0 3- Schizosaccharo- O14289 nt.1980–3983 of ISOPROPYLMALATE myces pombe SEQ ID NO: 563 DEHYDRATASE(IPMI) 564 2.00E−79 orf32 Haemophilus NP_536839.1 nt. 6241–6831 of phageHP2 SEQ ID NO: 564 564 7.00E−33 probable variable Salmonella NP_457882.1nt. 3707–4177 of tail fibre protein enterica subsp. SEQ ID NO: 564enterica serovar Typhi 564 2.00E−14 M protein Enterobacteria NP_052264.1nt. 1905–2213 of phage 186 SEQ ID NO: 564 564 4.00E−44 similar to tailfiber Salmonella NP_461635.1 nt. 3171–3692 of protein (gpH) intyphimurium LT2, SEQ ID NO: 564 phage P2 Fels-2 prophage 564 2.00E−85gpJ Enterobacteria NP_046773.1 nt. 2267–3166 of phage P2 SEQ ID NO: 564564 1.00E−24 hypothetical protein Neisseria NP_284534.1 nt. 6852–7334 ofmeningitidis SEQ ID NO: 564 Z2491 564 4.00E−26 gpv EnterobacteriaNP_046771.1 nt. 1337–1912 of phage P2 SEQ ID NO: 564 564 2.00E−47similar to Escherichia coli BAA16182.1 nt. 11383–11961 [SwissProt P44255of SEQ ID NO: 564 564 2.00E−51 hypothetical protein NeisseriaNP_284066.1 nt. 10452–11180 NMA1315 meningitidis of SEQ ID NO: Z2491 564564 0 orf31 Haemophilus NP_043502.1 nt. 4160–6226 of phage HP1 SEQ IDNO: 564 564 2.00E−09 rep Haemophilus NP_536816.1 compliment of phage HP2nt. 9986–10234 of SEQ ID NO: 564 565 2.00E−57 resolvase/ HaemophilusAAL47097.1 nt. 11885–12445 integrase-like influenzae biotype of SEQ IDNO: protein aegyptius 565 565 1.00E−93 integrase ActinobacillusAAC70901.1 compliment of actinomycetemco- nt. 4118–4900 mitans of SEQ IDNO: 565 565 6.00E−35 probable phage Salmonella NP_458745.1 compliment ofintegrase enterica subsp. nt. 4148–4990 of enterica serovar SEQ ID NO:565 Typhi 565  1.00E−107 hypothetical protein Xylella fastidiosaNP_299042.1 compliment of 9a5c nt. 5066–6817 of SEQ ID NO: 565 566 1.00E−126 hypothetical protein Haemophilus S15287 compliment of (galE5′ region) - influenzae nt. 10726–11607 of SEQ ID NO: 566 567 0 unknownPasteurella NP_246387.1 nt. 5343–7688 of multocida SEQ ID NO: 567 568 1.00E−151 multidrug Escherichia coli NP_311575.1 nt. 6–1403 ofresistance O157: H7 SEQ ID NO: 568 membrane translocase 568  1.00E−141YhbX/YhjW/YijP/Yj Neisseria NP_275002.1 compliment of dB family proteinmeningitidis nt. 11213–12634 MC58 of SEQ ID NO: 568 570  1.00E−180hypothetical protein Haemophilus S71024 compliment of 3 (ksgA-lic2Binfluenzae (strain nt. 12845–13720 intergenic region) RM7004) of SEQ IDNO: 570 571 0 glycerophospho- Haemophilus A43576 nt. 1656–2693 ofdiester influenzae (isolate SEQ ID NO: 571 phosphodiesterase 772) 571 1.00E−137 outer membrane Haemophilus A43604 nt. 6145–6909 of protein P4influenzae SEQ ID NO: 571 precursor - Haemophilus influenzae 5712.00E−72 CG8298 gene Drosophila AAF58597.1 nt. 3813–5339 of product [alt1] melanogaster SEQ ID NO: 571 572 1.00E−40 hypothetical proteinChlamydia G81737 nt. 3734–4099 of TC0130 muridarum (strain SEQ ID NO:572 Nigg) 572 5.00E−10 hypothetical protein Pyrococcus NP_142215.1 nt.4472–4888 of horikoshii SEQ ID NO: 572 572 3.00E−11 109aa longSulfolobus NP_377117.1 nt. 7303–7470 of hypothetical protein tokodaiiSEQ ID NO: 572 572 8.00E−43 hypothetical protein ChlamydophilaNP_445524.1 nt. 4289–4618 of pneumoniae SEQ ID NO: 572 AR39 572 9.00E−08CDH1-D Gallus gallus AAL31950.1 nt. 7183–7521 of SEQ ID NO: 572 575 1.00E−173 topoisomerase B Salmonella NP_458624.1 nt. 18980–20923enterica subsp. of SEQ ID NO: enterica serovar 575 Typhi 575  1.00E−100DNA helicase Salmonella NP_458617.1 nt. 10399–11706 enterica subsp. ofSEQ ID NO: enterica serovar 575 Typhi 65  2.00E−53 Sufl PasteurellaNP_245041.1 nt. 3–821 of multocida SEQ ID NO: 65 67  4.00E−39 putativeMFS Salmonella NP_462786.1 compliment of family tranport typhimurium LT2nt. 125–1033 of protein (1st mdule) SEQ ID NO: 67 7  4.00E−29 putativemembrane Salmonella NP_458664.1 compliment of protein enterica subsp.nt. 2–559 of enterica serovar SEQ ID NO: 7 Typhi 72  2.00E−51 serinetransporter Vibrio cholerae NP_230946.1 nt. 18–803 of SEQ ID NO: 72 74 3.00E−90 hypothetical 21.8 K Haemophilus JH0436 compliment of protein(in locus influenzae nt. 248–766 of involved in SEQ ID NO: 74transformation) - 77  2.00E−18 RecX protein Legionella CAC33485.1 nt.480–920 of pneumophila SEQ ID NO: 77 82  4.00E−95 unknown PasteurellaNP_246414.1 nt. 128–955 of multocida SEQ ID NO: 82 83  2.00E−66 unknownPasteurella NP_246777.1 nt. 5–556 of multocida SEQ ID NO: 83 83 6.00E−33 CTP SYNTHASE Helicobacter NP_223042.1 compliment of pylori J99nt. 1027–1338 of SEQ ID NO: 83. 83  4.00E−34 CTP synthase CampylobacterNP_281249.1 compliment of jejuni nt. 1024–1275 of SEQ ID NO: 83 84 1.00E−16 REPRESSOR Bacteriophage P14819 nt. 823–1233 of PROTEIN Clphi-80 SEQ ID NO: 84 84  2.00E−05 orf, hypothetical Escherichia coliNP_415875.1 compliment of protein K12 nt. 533–700 of SEQ ID NO: 84 84 4.00E−11 orf33 bacteriophage phi NP_490633.1 compliment of CTX nt.32–466 of SEQ ID NO: 84 85  3.00E−42 SpoT Pasteurella NP_245857.1 nt.899–1261 of multocida SEQ ID NO: 85 90   1.00E−103 putative methylaseBacteriophage NP_108695.1 compliment of Tuc2009 nt. 478–1206 of SEQ IDNO:90 90  4.00E−11 probable adenine Thermoplasma NP_394624.1 complimentof specific DNA acidophilum nt. 397–1140 of methyltransferase SEQ ID NO:90

TABLE 3B Full Length Nucleotide Amino Acid Homology to Genbank HitIdentity Sequence Sequence Location in Contig Protein CpdB SEQ ID NO:SEQ ID NO: nt. 38041–36068 of NP_246953.1 686 687 SEQ ID NO: 681 (contig14) putative membrane SEQ ID NO: SEQ ID NO: nt. 906601–908094NP_458664.1 protein 688 689 of SEQ ID NO: 685 (contig 18) GTP-bindingSEQ ID NO: SEQ ID NO: nt. 42557–40995 of NP_240245.1 protein TypA/BipA690 691 SEQ ID NO: 683 (contig 16) outer membrane SEQ ID NO: SEQ ID NO:nt: 7000420– T30852 protein A 692 693 704187 of SEQ ID NO: 685 (contig18) vacB protein SEQ ID NO: SEQ ID NO: nt. 39184–36836 of NP_240369.1694 695 SEQ ID NO: 683 (contig 16) putative ABC SEQ ID NO: SEQ ID NO:nt. 59155–58370 of NP_282774.1 transport system 696 697 SEQ ID NO: 685permease protein [ (contig 18) putative exported SEQ ID NO: SEQ ID NO:nt. 901142–902542 NP_458655.1 protein 698 699 Of SEQ ID NO: 685 (contig18) ImpA SEQ ID NO: SEQ ID NO: nt. 348187–347747 NP_245829.1 700 701 ofSEQ ID NO: 685 (contig 18) TsaA SEQ ID NO: SEQ ID NO: nt. 74941–75548 ofNP_245732.1 702 703 SEQ ID NO: 684 (contig 17) PROBABLE SEQ ID NO: SEQID NO: nt. 74436–75176 of NP_522358.1 TRANSPORT 704 705 SEQ ID NO: 685TRANSMEMBRANE (contig 18) PROTEIN SEQ ID NO: SEQ ID NO: nt. 75160–75660of 706 707 SEQ ID NO: 685 (contig 18) possible exported SEQ ID NO: SEQID NO: nt. 899618–900262 NP_458653.1 protein 708 709 of SEQ ID NO: 685(contig 18) LICA PROTEIN SEQ ID NO: SEQ ID NO: nt. 356917–355958 P14181710 711 of SEQ ID NO: 685 (contig 18) HEME-BINDING SEQ ID NO: SEQ ID NO:NT. 26114–27739 P33950 PROTEIN A 712 713 of SEQ ID NO: 683 (contig 16)similar to SEQ ID NO: SEQ ID NO: nt. 311610–312683 XP_068727.1 BASEMENT714 715 of SEQ ID NO: 685 MEMBRANE- (contig 18) SPECIFIC HEPARAN SULFATEPROTEOGLYCAN CORE PROTEIN PRECURSOR (HSPG) CzcD SEQ ID NO: SEQ ID NO:nt. 34865–35542 of NP_246276.1 716 717 SEQ ID NO: 681 (contig 14)conserved SEQ ID NO: SEQ ID NO: nt. 194993–193977 NP_274972.1hypothetical protein 718 719 of SEQ ID NO: 685 (contig 18) secretionprotein SEQ ID NO: SEQ ID NO: nt. 203707–201857 NP_252510.1 SecD 720 721of SEQ ID NO: 683 (contig 17) ABC transporter SEQ ID NO: SEQ ID NO: nt.3943–5859 of AAF31030.1 protein 1 722 723 SEQ ID NO: 681 (contig 14) .conserved SEQ ID NO: SEQ ID NO: nt. 331090–331749 NP_273467.1hypothetical protein 724 725 of SEQ ID NO: 685 (contig 18) SEQ ID NO:SEQ ID NO: nt. 331938–332492 726 727 of SEQ ID NO: 685 (contig 18) SEQID NO: SEQ ID NO: nt. 332681–33232 728 729 of SEQ ID NO: 685 (contig 18)INVASIN SEQ ID NO: SEQ ID NO: nt. 416757–417020 P31489 PRECURSOR 730 731of SEQ ID NO: 685 (OUTER (contig 18) MEMBRANE ADHESIN) HEME/HEMOPEXIN-SEQ ID NO: SEQ ID NO: nt. 229430–232195 P45355 BINDING 732 733 of SEQ IDNO: 384 PROTEIN (contig 17) OPACITY SEQ ID NO: SEQ ID NO: nt.375592–375879 Q05033 PROTEIN OPA66 734 735 of SEQ ID NO: 384 PRECURSOR(contig 17) Hemoglobin- SEQ ID NO: SEQ ID NO: nt. 45709–42566 of Q48153haptoglobin binding 736 737 SEQ ID NO: 681 protein A (contig 14)transport protein SEQ ID NO: SEQ ID NO: nt. 134452–135222 NP_253757.1TatC 738 739 of SEQ ID NO: 384 (contig 17) LIPOPROTEIN SEQ ID NO: SEQ IDNO: nt. 18895–20112 of P40827 NLPD 740 741 SEQ ID NO: 682 (contig 15)Hemoglobin and SEQ ID NO: SEQ ID NO: nt. 34181–31041 of Q9X442hemoglobin- 742 743 SEQ ID NO: 682 haptoglobin binding (contig 15)protein C precursor HimA SEQ ID NO: SEQ ID NO: nt. 382795–383085NP_245565.1 744 745 of SEQ ID NO: 685 (contig 18) transferrin-bindingSEQ ID NO: SEQ ID NO: nt. 178537–175799 S70906 protein 1 746 747 of SEQID NO: 683 (contig 16) SapC SEQ ID NO: SEQ ID NO: nt. 197754–196867NP_245850.1 748 749 of SEQ ID NO: 685 (contig 18) heat shock protein SEQID NO: SEQ ID NO: nt. 40414–41265 of NP_273864.1 HtpX 750 751 SEQ ID NO:682 (contig 15) HEME/HEMOPEXIN- SEQ ID NO: SEQ ID NO: nt. 229430–232195P45354 BINDING 752 753 of SEQ ID NO: 684 PROTEIN (contig 17)HEME/HEMOPEXIN SEQ ID NO: SEQ ID NO: nt. 227721–229418 P45356UTILIZATION 754 755 of SEQ ID NO: 684 PROTEIN B (contig 17)HEME/HEMOPEXIN SEQ ID NO: SEQ ID NO: nt. 225516–227645 P45357UTILIZATION 756 757 of SEQ ID NO: 684 NP_246561.1 PROTEIN C (contig 17)iron utilization SEQ ID NO: SEQ ID NO: nt. 32076–33611 of T10887 proteinB 758 759 SEQ ID NO: 684 (contig 17) PREPROTEIN SEQ ID NO: SEQ ID NO:nt. 82314–84785 of P96313 TRANSLOCASE 760 761 SEQ ID NO: 683 SECASUBUNIT (contig 16) IMMUNOGLOBULIN SEQ ID NO: SEQ ID NO: nt.171647–166263 P45384 A1 PROTEASE 762 763 of SEQ ID NO: 683 (contig 16)multidrug SEQ ID NO: SEQ ID NO: nt. 74524–72992 of NP_311575.1resistance 764 765 SEQ ID NO: 683 membrane (contig 16) translocaseYhbX/YhjW/YijP/Yj SEQ ID NO: SEQ ID NO: nt. 61734–63200 of NP_275002.1dB family protein 766 767 SEQ ID NO: 683 (contig 16) putative membraneSEQ ID NO: SEQ ID NO: nt. 906601–908094 NP_458664.1 protein 768 769 ofSEQ ID NO: 685 (contig 18) putative membrane SEQ ID NO: SEQ ID NO: nt.16185–17942 of NP_404859.1 protein 770 771 SEQ ID NO: 683 (contig)

EXAMPLE 3 Construction of the NTHi Promoter Trap Library

To identify potential virulence determinants of NTHi, bacterial geneexpression was monitored by differential fluorescence induction (DFI)during early disease progression in one specific anatomical niche of achinchilla model of NTHi-induced otitis media (OM). Genomic DNAfragments from NTHi strain 86-028NP were cloned upstream of thepromoterless gfpmut3 gene using a promoter trap library. PlasmidpGZRS39A, a derivative of pGZRS-1 isolated from Actinobacilluspleuropneumoniae, is an A. pleuropneumoniae-Escherichia coli shuttlevector. This plasmid contains the origin of replication from A.pleuropneumoniae, the lacZa gene from pUC19 and the kanamycin resistancegene from Tn903. (West et al., Genes, 160: 81-86, 1995).

The promoter trap vector was constructed by cloning the GTP mutantgfpmut3 gene, as a BamHI to EcoRI fragment into pGZRS-39A to formpRSM2167. This mutant GTP gene contains two amino acid changes, S65G andS72A, that enhance fluorescence emission when excited at 488 nm. Thismutant also has high solubility and fast kinetics of chromophoreformation (Cormack et al., Gene, 173: 33-38, 1996). This plasmid wastransformed by electroporation into NTHi strain 86-028NP, generating theparent-plasmid strain 86-028NP/pRSM2169.

Random genomic DNA fragments (described in Example 1) were prepared forligation into the promoter probe vector. Genomic DNA was isolated fromstrain 86-028NP using the Puregene DNA isolation kit (Gentra Systems,Minneapolis, Minn.) according to the manufacturer's protocol. Due torestriction barriers, it was necessary to isolate the plasmid DNA anduse this for the library generation. The isolated DNA was partiallydigested with Sau3AI (NEB, Beverly, MA; 0.25 units/μg DNA) for 1 hour at37° C., separated by gel electrophoresis and DNA fragments 0.5-1.5 kb insize were recovered using the Qiagen gel extraction kit. For vectorpreparation, pRSM2167 was isolated from an overnight culture using theWizard Plus Makiprep DNA purification system (Promega, Madison Wis.)according to the manufacturer's protocol.

Plasmid DNA was linearized by BamHI digestion and 5′ phosphate groupsremoved by treatment with calf intestinal alkaline phosphatase (CIAP;GibcoBRL Life Technologies). Genomic DNA fragments were ligated with thelinearized, phosphatase-treated vector and electroporated into competentNTHi strain 86-028NP prepared for electroporation according to amodified protocol (Mitchell et al., Nucleic Acids Res., 19:3625-3628,1991). When plasmid DNA was electroporated back into NTHistrain 86-028NP, transformation efficiency was improved by one-thousandfold. Briefly, cells were grown to an OD₆₀₀=0.3 in sBHI (brain heartinfusion) broth at 37° C., 220 rpm. Cells were chilled on ice for 30minutes and subsequently washed with an equal volume of 0.5×SG (1×SG:15% glycerol, 272 mM sucrose) at 4° C. Washes were repeated a total ofthree times. Subsequently, the cells were diluted in 1×SG to a 100×concentrated volume. The cells were electroporated using the BioRad GenePulser II set at 200 ohms, 2.5 kV and 25 μF and then diluted in 1 mlprewarmed sBHI, incubated for 2 hours at 37° C., 5% CO₂ and plated onchocolate agar for overnight growth of transformants.

Transformants were selected and frozen in pools of 1000 clones in skimmilk containing 20% glycerol (vol/vol). A 68,000 member gfp promoterprobe library was generated. Using the probability calculation of Clarkeand Carbon (Cell, 9: 91-99, 1976), to achieve a 99% probability ofhaving a given DNA sequence represented in a library of 300 bp fragmentsof strain 86-028NP DNA (1.8×10⁶ bp/genome), a library of 27,629 cloneswas needed. Therefore the present library represents 2.5 fold coverageof the 86-028NP genome.

In order to assess the quality of the library, fifty clones wereselected at random, grown overnight on chocolate agar and the plasmidswere isolated and insert DNA sequenced. A majority (64%) of the selectedclones had insert sizes ranging between 200 and 500 bp while 32%exceeded 500 bp. The majority of inserts showed homology to unique H.influenzae strain Rd. open reading frames (ORFs), and 15 clones hadsequence unique to strain 86-028NP DNA. Of those clones with homology tostrain Rd, 60% were in the correct orientation, 36% of which containedsequence upstream an ORF. Although a majority of clones had an insertsize less than 500 bp, no correlation was found between small insert,size and increased GFP expression. In fact four clones exhibited slightto moderate fluorescence in vitro, 3 of which had insert sizes between200-500 base pairs and one had an insert that was greater than 700 basepairs.

A fraction of the library (approximately 1000 clones) was grown onchocolate agar, harvested in PBS and analyzed by flow cytometry for GFPfluorescence. Compared to strain 86-028NP/pRSM2169 that contains thepromoter trap vector without insert DNA, the pool of library clonesdisplays an increased fluorescence intensity. Thus, the library containsclones with promoters at varying levels of activity.

EXAMPLE 4 Analysis of 86-028NP Derivatives Expressing GFP

In order to establish the FACS parameters necessary to identify and sortgfp-expressing bacteria, a panel of isolates demonstrating varyinglevels of gfp expression was utilized. Background fluorescence wasassessed using strain 86-028NP/pRSM2169 (negative control), thereforeany observed fluorescence would be due to the lacZ promoter driving gfpexpression. However, this strain does not produce detectable levels ofGFP and in fact, does not demonstrate increased fluorescence whencompared to the parent strain 86-028NP. A high-level gfp-expressingisolate was generated by cloning a 500 bp fragment containing the strongpromoter for outer membrane protein P2 expression into SalI-BamHIdigested pRSM2167. This plasmid was transformed into 86-028NP byelectroporation, generating the high-level gfp expressing strain86-028NP/pRSM2211 (highly fluorescent control). This strain demonstratedan approximate 100 fold increase in GFP fluorescence compared to strain86-028NP/pRSM2169. An intermediate fluorescent derivative clone,86-028NP/pKMM4B5 (intermediate fluorescent control), was isolated byFACS analysis and used both in preliminary experiments and as a controlfor cell sorting. The DNA fragment containing a promoter driving gfpexpression in vitro is unique to strain 86-028NP, having no knownhomology to DNA of other organisms. This clone exhibits an approximate10 fold increase in fluorescence compared to strain 86-028NP/pRSM2169.

The control strains were resuspended from growth on chocolate agar andlabeled with cross-reactive Phycoprobe R-PE anti-human IgG (H+L)antibody (10 μg/ml in 100 μl PBS; Biomeda Corp) for 30 minutes at 4° C.Following three successive washes to remove unbound antibody, bacteriawere resuspended in 300 μl Dulbecco's Phosphate Buffered Saline (DPBS)for FACS analysis. These control preparations were used to set theappropriate size and fluorescence gates using a Coulter Epics Elite flowcytometer (Coulter Corp.) equipped with an argon laser emitting at 488nm. Bacteria were gated for size based on log forward angle and sidescatter detection and for sorting by FITC/PE labeling of bacteria.Sorted cells were collected into cold sBHI and plated on chocolate agar.After overnight growth, cells were collected for a secondary round ofinfection or were individually selected and grown overnight, screened byindividual clone for fluorescence when grown in vitro, and frozen inskim milk containing 20% (vol/vol) glycerol prior to plasmid isolationand sequencing of insert DNA. Sorting efficiency of control strains wasconfirmed using a Coulter EPICS flow cytometer (Coulter Corp.).

Many plasmids were segregated rapidly in vitro in the absence ofantibiotic selection. Thus, in order to assess whether the promoter trapvector used here was prone to this event, a single colony of strain86-028NP/pRSM2211 (highly fluorescent control) was isolated on chocolateagar and passaged 20 times in the absence of antibiotic selection. Nosignificant decrease in fluorescence intensity was observed whencompared to bacteria grown in the presence of antibiotic. In addition,the plasmid is maintained in the absence of antibiotic selection invivo. Similar bacterial counts were observed when bacteria-containingmiddle ear fluids collected from a chinchilla were plated on chocolateagar with or without kanamycin. These data demonstrate that the promotertrap vector was stably maintained in the absence of antibioticselection.

In addition to problems with plasmid stability, early studies on the useof GFP as a reporter to study host-pathogen interactions demonstratedthat GFP could be continuously synthesized as a cytoplasmic protein withlow toxicity, having minimal effects on the bacterial cell-surfacedynamics (Chalfie et al., Science, 263: 802-805, 1994). The constructionof a high level gfp-expressing derivative allowed the assessment of theGFP toxicity on NTHi. Growth curves of both the wild-type strain(86-028NP) and the high GFP producing strain 86-028NP/pRSM2211 werecompared when grown under similar conditions. The growth rates weresimilar, indicating that GFP expression was not toxic to the cells.

The 86-028NP gfp-expressing derivatives were used to define theparameters for efficient cell sorting. Strain 86-028NP/pRSM2169 wasmixed with the intermediate gfp-expressing derivative, strain86-028NP/pKMM4B5, at a 100:1 ratio, simulating the in vivo environmentthat is expected to contain a small percentage of gfp-expressing clonesrelative to the total bacterial population. This mixture was subjectedto FACS analysis, collecting the 1.8% most fluorescent population andthe 52% least fluorescent population. Flow cytometric analysis of thesorted populations revealed an enrichment of strain 86-028NP/pKMM4B5 to65% of the bacterial population, a phenomenon that was not observed whensorting on the negative population. Subsequent rounds of sorting wouldbe expected to further enrich for this intermediate fluorescentpopulation. The inability to decrease the amount of fluorescent bacteriain the negative sort was attributed to the size of the gate set fornegative sorting. GFP-negative cells were enriched by gating on the 10%least fluorescent population.

EXAMPLE 5 Direct Labeling of Bacteria from Middle Ear Fluids

A similar strategy (as described in Example 5) was applied to sortfluorescent clones from effusions obtained from the chinchilla middleear during AOM. Our ability to use differential fluorescence induction(DFI) in vivo was dependent upon our ability to sort gfp-expressingbacteria from non-fluorescent bacteria, fluorescent and non-fluorescentcellular debris, and eukaryotic cells.

Healthy adult chinchillas (Chinchilla lanigera) with no evidence ofmiddle ear infection by either otoscopy or tympanometry were used toscreen the library for promoter activity in vivo. Two pools of theNTHi/pRSM2169 library (1000 clones each) were grown overnight onchocolate agar containing kanamycin. The library was combined anddiluted in cold 10 mM sterile PBS to 3.3×10⁶ CFU/ml and 300 μl (1.0×10⁶CFU; 500 CFU/clone) was used to inoculate the left and the rightchinchilla transbullar cavity (2000 clones/ear). OM development wasmonitored by video otoscopy and tympanometry at 24 and 48 hours. Thebacteria multiplied in the middle ear cavity, reaching a concentration500 times the inoculurri dose by 48 hours as expected (Bakaletz et al.,Infect. Immunity 67: 2746-62, 1999). This bacterial adaptation to thehost environment results in an inflammatory response, indicated byerythema, vessel dilation and bulging of the tympanic membrane,infiltration of polymorphonuclear cells (PMN's), and accumulation offluid in the middle ear cavity as observed by otoscopy and microscopicexamination of recovered effusions. Twenty-four and 48 hours later,middle ear fluids were retrieved by epitympanic tap, and prepared forFACS.

It is important to note that this analysis was limited to those bacteriarecoverable in the middle ear fluid. In some cases it was necessary tolavage the middle ear cavity to collect the bacteria for FACS analysis.Thus, this analysis includes genes up-regulated when NTHi are looselyadherent to mucosae. NTHi has been observed to form a biofilm in themiddle ear cavity in a chinchilla model of OM (Erhlich et al., JAMA,287: 1710-5, 2002). Since the protocols described herein select forclones recovered from the planktonic population, it is not expected torecover those clones in which genes are up-regulated when the bacteriaare associated with mucosal biofilms. Homogenization of middle earmucosae and subsequent bacterial cell isolation however, would enable usto recover these clones. It is also possible that some GFP-expressingclones were recovered in the effusion, yet were adherent to eukaryoticcells present in the effusion as exfoliated cells, or in aggregates.These bacteria are difficult to recover from the effusion withoutcompromising the sorting efficiency. Therefore the middle ear fluidswere treated with a mucolytic agent, then centrifuged to remove largeaggregates and eukaryotic cells and prior to labeling.

Chinchilla middle ear fluids were diluted, if necessary, to 250 μl withsterile saline. An equal volume of N-acetyl-L-cysteine (0.5%; w/v) inDPBS (pH 7.4) was added for 5 minutes at room temperature as a mucolyticagent (Miyamoto and Bakaletz, Microb. Pathog., 21: 343-356 1996). Fluidswere centrifuged (300×g, 5 min) to remove cellular debris, red bloodcells and inflammatory cells, and supernatants containing bacteria weretransferred to a fresh tube. Bacteria were incubated with chinchillaantiserum (1:50 dilution) directed against a whole OMP preparation,derived from NTHi strain 86-028NP, for 45 minutes at 4° C., pelleted bycentrifugation (2000×g, 5 min) and washed twice with cold DPBScontaining 0.05% bovine serum albumin. Bacteria were subsequentlylabeled with-cross-reactive phycoprobe R-PE anti-human IgG (H+L)antibody (10 μg/ml in 100 μl PBS; Biomeda Corp) for 30 minutes at 4° C.Following three successive washes to remove unbound antibody, cells wereresuspended in 300 μl DPBS for FACS analysis.

EXAMPLE 6 Identification of Promoters Induced In Vivo in Acute OtitisMedia

H. influenzae 86-028NP transformed with the promoter trap library wasgrown overnight on chocolate agar. To select against those clonescontaining promoters that expressed gfp in vitro, the library wassubjected to one round of FACS analysis (as described in Example 6),collecting only those clones expressing low-level amounts of GFP. Theseclones were pooled and used to inoculate the chinchilla middle eartransbullarly. Following 24 and 48 hours of infection,bacteria-containing effusions were removed by epitympanic tap. Bacteriawere indirectly labeled with R-PE-labeled antibody and subjected to FACSanalysis by gating on fluorescently tagged bacteria but sorting forthose that were also expressing. These clones were used to reinfectanimals for further enrichment. Following the final round of sorting,single colony isolates were screened in vitro for lack of fluorescence.

Those clones isolated by FACS analysis (positive for GFP fluorescence invivo), which did not emit fluorescence in vitro were prepared forplasmid isolation and identification of insert DNA sequence. Theseclones were grown overnight on chocolate agar plates containingkanamycin and prepared for plasmid isolation using the Qiaprep MiniprepKit (Qiagen) according to the manufacturer's protocol. Plasmid insertDNA was sequenced using the primer 5′-TGCCCATTAACATCACCATCTA-3′ (SEQ IDNO: 588) that is complementary to the gfpmut3 gene and downstream of theinsert DNA. Sequencing reactions were performed using the ABI prismBigDye® terminator cycle sequencing ready reaction kit (AppliedBiosystems) according to manufacturer's protocol using a GeneAmp PCRSystem 9700 (Applied Biosystems). The sequences were then purified bypassage through sephadex G-50 in a 96-well multiscreen HV plate(Millipore) and subsequently analyzed on an ABI Prism 3100 DNA analyzer(Applied Biosystems).

Insert sequences were compared to the complete annotated sequence of H.influenzae strain Rd. Those inserts with no nucleotide homology tostrain Rd were subsequently analyzed using the BLASTN and BLASTXalgorithms. Further sequence analysis was performed with DNASTAR(Madison, Wis.). Inserts in the correct orientation and containingsequence 5′ to a predicted ORF contained a putative promoter that waspreferentially active when the NTHi bacteria were in the chinchillamiddle ear.

Fifty-two clones with putative promoters that were regulated in vivowere isolated. Of the 44 candidate clones containing sequence similar tothat identified in H. influerzae strain Rd, quantitative comparison ofgene expression in vitro and in vivo confirmed up-regulated geneexpression for twenty-six genes (60%) when NTHi respond to environmentalcues present in the chinchilla middle ear and these genes are summarizedin Table 4A below. The in vivo-regulated promoters driving expression ofgenes are predicted to be involved in membrane transport, environmentalinformational processing, cellular metabolism, gene regulation, as wellas hypothetical proteins with unknown function.

In order to confirm the induction of putative promoter candidates invivo, the relative amount of messenger RNA expression was compared whenNTHi strain 86-028NP was grown in vitro to mid-log phase or in vivo for48 hours. The RNA was isolated using TRIzol LS reagent (Gibco LifeTechnologies) according to the manufacturer's protocol. DNA was removedfrom the RNA preparation using DNA-free kit (Ambion) according to themanufacturer's protocol. DNase I treated RNA samples were purified bypassage through a Qiagen RNeasy column. RNA purity and integrity wasassessed by 260/280 nm spectrophotometer readings and on the Agilent2100 Bioanalyzer (Agilent Technologies), respectively.

In order to independently confirm the FACS data, we determined therelative expression of candidate genes by quantitative RT-PCR. Theparent strain 86-028NP, was used for these studies. Real-timequantitative RT-PCR using the one-step QuantiTect SYBR Green RT-PCR kit(Qiagen).assessed transcription levels according to the manufacture'sinstructions. Briefly, using primers generated to an open reading framedownstream of the putative in vivo-induced promoters identified by FACSanalysis, gene-specific mRNA was reverse transcribed and amplified byRT-PCR on the ABI Prism 7700 sequence detection system (AppliedBiosystems). The amount of product was calculated using a standard curvegenerated to known amounts of bacterial genomic DNA (10²-10⁷ genomiccopies DNA) by amplifying a fragment of the gyrase (gyr) gene. Controlswere analyzed in parallel to verify the absence of DNA in the RNApreparation (-RT control) as well as the absence of primer dimers incontrol samples lacking template RNA. In addition, RT-PCR products wereanalyzed by gel electrophoresis and, in all cases, a single product wasobserved at the appropriate base pair size. Amounts of bacterial RNAbetween samples were normalized relative to gyr expression, shown to beconstitutively expressed under various growth conditions that we testedin vitro. Known amounts of bacterial genomic DNA (10²-10⁷ genomic copiesDNA) were used to generate a standard curve for RT-PCR quantitation byamplifying a fragment of the gyrase (gyr) gene. Gyrase is constitutivelyexpressed in vitro under various growth conditions and was thereforeused to normalize total bacterial RNA levels between samples. Relativegene expression in vivo was compared to that of gene expression in vitroand data expressed as fold-increase are summarized in Table 4A.

The 8-fold sequencing of the NTHi genome identified the full length openreading frames for the majority of genes listed in Table 4A. Table 4Bprovides the full length nucleotide sequence within the NTHi genome andthe corresponding amino acid sequence. The fold induction of the genedue to environmental cues present in the chinichilla middle ear and theproduct or function of the gene are repeated in Table 4B forconvenience.

TABLE 4A SEQ Gene or ID GenBank Fold Category ORF NO: Protein IDInduction Product or Function Amino acid hisB 589 NP_438632 2.9Histidine biosynthesis metabolism bifunctional protein Lipoprotein lppB590 NP_438862.1 2.6 Lipoprotein B homologue Membrane transport sapA 591NP_439780.1 2.8 Peptide ABC transporter; periplasmic SapA precursor lolA592 NP_439736.1 2.4 Outer membrane lipoproteins carrier proteinprecursor rbsC 593 NP_438661.1 5.1 Ribose transport system permeaseprotein Purine synthesis purE 594 NP_439757.1 51.7Phosphoribosylaminoimidazole carboxylase catalytic subunit; PurEBiosynthetic and ribB 595 NP_438923.1 8.3 3,4-dihydroxy-2-butanone 4-metabolic functions phosphate synthase; riboflavin biosynthesis arcB 596NP_438753.1 10 Ornithine carbamolytransferase; arginine degradation uxuA597 NP_438228.1 3.1 Mannonate dehydratase; production of glyceraldehyde3-phosphate dsbB 598 NP_438589.1 2.6 Bisulfide oxidoreductase; disulfidebond formation protein B ureH 599 NP_438693.1 3.9 Urease accessoryprotein licC 600 NP_439688.1 2.3 Phosphocholine (ChoP)cytidylyltransferase HI1647 601 NP_439789.1 2.0 Putative pyridoxinbiosynthesis protein; singlet oxygen resistance protein DNA replication,ispZ 602 P43810 2.5 Probable intracellular repair septation protein radC603 NP_439113.1 2.1 DNA repair protein mukF 604 P45185 2.0 MukF proteinhomologue; remodeling of nucleiod structure Gene regulation glpR 605NP_438777.1, 2.8 Glycerol-3-phosphate regulon NP_439170.1 represser ihfB606 P43724 2.5 Integration host factor beta subunit argR 607 NP_439365.12.7 Arginine repressor cspD 608 NP_439584.1 2.1 Cold shock like protein;stress response protein Hypothetical or HI0094 609 NP_438267.1 8.3Hypothetical protein unknown HI1163 610 NP_439321.1 2.3 Conservedhypothetical proteins protein; putative oxidase HI1063 611 NP_439221.12.7 Hypothetical protein HI0665 612 NP_438824.1 2.8 Hypothetical proteinHI1292 613 NP_439444.1 2.6 Hypothetical protein HI1064 614 NP_439222.12.6 Hypothetical protein

TABLE 4B Full Gene Length or Nucleotide Amino Acid Fold Product orCategory ORF Sequence Sequence Location in Contig Induction FunctionAmino hisB SEQ ID NO: SEQ ID NO: nt. 68378–67290 2.9 Histidine acid 615616 of SEQ ID NO: biosynthesis metabolism 680 (contig 13) bifunctionalprotein Membrane sapA SEQ ID NO: SEQ ID NO: nt. 200403– 2.8 Peptide ABCtransport 617 618 198709 of SEQ transporter; ID NO: 685 periplasmic SapA(contig 18) precursor rbsC SEQ ID NO: SEQ ID NO: nt. 42773–41802 5.1Ribose transport 619 620 of SEQ ID NO: system permease 680 (contig 13)protein Purine purE SEQ ID NO: SEQ ID NO: nt. 219625– 51.7Phosphoribosylamin synthesis 621 622 219131 of SEQ oimidazole ID NO: 685carboxylase catalytic (contig 18) subunit; PurE Biosynthetic ribB SEQ IDNO: SEQ ID NO: nt. 131537– 8.3 3,4-dihydroxy-2- and 623 624 132184 ofSEQ butanone 4- metabolic ID NO: 682 phosphate synthase; functions(contig 15) riboflavin biosynthesis arcB SEQ ID NO: SEQ ID NO: nt.49710–48706 10 Ornithine 625 626 of SEQ ID NO: carbamolytransferase; 681(contig 14) arginine degradation uxuA SEQ ID NO: SEQ ID NO: nt. 840671–3.1 Mannonate 627 628 841855 of SEQ dehydratase; ID NO: 685 productionof (contig 18) glyceraldehyde 3- phosphate dsbB SEQ ID NO: SEQ ID NO:nt. 388050– 2.6 Disulfide 629 630 388583 of SEQ oxidoreductase; ID NO:384 disulfide bond (contig 17) formation protein B ureH SEQ ID NO: SEQID NO: nt. 4452–5267 of 3.9 Urease accessory 631 632 SEQ ID NO: 680protein (contig 13) licC SEQ ID NO: SEQ ID NO: nt. 355083– 2.3Phosphocholine 633 634 354382 of SEQ (ChoP) ID NO: 385cytidylyltransferase (contig 18) HI1647 SEQ ID NO: SEQ ID NO: nt.664017– 2.0 Putative pyridoxin 635 636 664892 of SEQ biosynthesisprotein; ID NO: 685 singlet oxygen (contig 18) resistance protein DNAispZ SEQ ID NO: SEQ ID NO: nt. 4512–5069 of 2.5 Probable replication,637 638 SEQ ID NO: 683 intracellular repair (contig 16) septationprotein radC SEQ ID NO: SEQ ID NO: nt. 132695– 2.1 DNA repair protein639 640 132030 of SEQ ID NO: 683 (contig 16) mukF SEQ ID NO: SEQ ID NO:nt. 504549– 2.0 MukF protein 641 642 503215 of SEQ homologue; ID NO: 685remodeling of (contig 18) nucleiod structure Gene glpR SEQ ID NO: SEQ IDNO: nt. 72716–73483 2.8 Glycerol-3- regulation 643 644 of SEQ ID NO:phosphate regulon 682 (contig 15) repressor ihfB SEQ ID NO: SEQ ID NO:nt. 661004– 2.5 Integration host 645 646 660720 of SEQ factor betasubunit ID NO: 685 (contig 18) argR SEQ ID NO: SEQ ID NO: nt. 178540–2.7 Arginine repressor 647 648 178085 of SEQ ID NO: 685 (contig 18) cspDSEQ ID NO: SEQ ID NO: nt. 435310– 2.1 Cold shock like 649 650 435528 ofSEQ protein; stress ID NO: 685 response protein (contig 18) HypotheticalHI1163 SEQ ID NO: SEQ ID NO: nt. 137202– 2.3 Conserved or 651 652 134119of SEQ hypothetical protein; unknown ID NO: 685 putative oxidaseproteins (contig 18) HI1063 SEQ ID NO: SEQ ID NO: nt. 35158–34937 2.7Hypothetical protein 653 654 of SEQ ID NO: 685 (contig 18) HI0665 SEQ IDNO: SEQ ID NO: nt. 17949–18980 2.8 Hypothetical protein 655 656 of SEQID NO: 679 (contig 12) HI1292 SEQ ID NO: SEQ ID NO: nt. 555002– 2.6Hypothetical protein 657 658 555799 of SEQ ID NO: 685 (contig 18)

EXAMPLE 7 Identification of Virulence-Associated Genes

In many bacterial species, a subset of virulence-associated genes isregulated by errors in replication of short repeats. These repeats maybe 5′ to a gene or in the coding sequence, and their presence is anindication of controlled expression of the gene, which indicatesassociation with virulence. Addition or deletion of a repeat results inthe expression or of lack of expression of the particular virulencedeterminant.

The NTHi H. influenzae strain 86-028NP contig set was queried for shortoligonucleotide repeats. The region surrounding the repeats was analyzedto identify the gene(s) associated with the repeat. Table 5 lists theidentified repeats and the ORF (identified by BLAST) associated witheach repeat.

Further sequence analysis has identified the full length nucleotidesequence of the virulence-assocated genes and the corresponding aminoacid sequences encoded by the ORF. The derived amino acid sequences arehighly homologous to the listed Genbank sequence.

TABLE 5 Location in Location in Full Length Amino Acid Genebank Repeat3-fold Contigs 8-fold Contigs Nucleotide Sequence Sequence Accession No.SEQ ID 115 nt. 484533– SEQ ID SEQ ID NP_439538.1 NO: 581 nt. 473–540483643 of NO: 659 NO: 660 of SEQ ID SEQ ID NO: 685 NO: 115 (contig 18)SEQ ID 377 nt. 416274– SEQ ID SEQ ID P45217 NO: 582 nt. 546–597 414910of NO: 661 NO: 662 of SEQ ID NO: SEQ ID 685 (contig 18) NO: 337 SEQ ID505 nt. 414500– SEQ ID SEQ ID AAK76425 NO: 583 nt. 310–393 416614 of NO:663 NO: 664 of SEQ ID NO: SEQ ID 684 (contig NO: 505 17) SEQ ID 508 nt.506516– SEQ ID SEQ ID NP_439520 NO: 584 nt. 2079– 507913 of NO: 665 NO:666 2120 of SEQ ID NO: SEQ ID 685 (contig NO: 508 18) SEQ ID 518 nt.354274– SEQ ID SEQ ID NP_284893 NO: 585 nt. 758–789 352406 of NO: 667NO: 668 of SEQ ID NO: SEQ ID 684 (contig NO: 518 17) SEQ ID 543 nt.347864– SEQ ID SEQ ID AAA20524 NO: 586 nt. 1814– 243236 of NO: 669 NO:670 196 of SEQ ID NO: SEQ ID 685 (contig NO: 543 18) SEQ ID 543 nt.699709– SEQ ID SEQ ID AAD56660 NO: 586 nt. 1814– 704187 of NO: 671 NO:672 196 of SEQ ID NO: SEQ ID 685 (contig NO: 543 18) SEQ ID 567 nt.85546– SEQ ID SEQ ID ZP_00053190 NO: 587 nt. 13309– 84689 of NO: 673 NO:674 13320 of SEQ ID NO: SEQ ID 681 (contig NO: 567 14)

EXAMPLE 8 Identification of Unique NTHi Gene Sequences

Genes associated with NTHi virulence were also identified by comparingthe level of expression of the gene when the NTHi bacterium wasinfecting a tissue verses the level of expression of the same gene whenthe NTHi was grown on artificial laboratory media. These novel geneswere identified using the promoter trap techniques described above inExamples 4-6, and subsequently comparisons with the known Rd genomedemonstrated these genes are unique to NTHi strain 86-028NP.

The DNA sequence identified using this screening procedure are set forthas SEQ ID NOS: 577-580. These sequences did not contain genes or genefragments that have homologues in the H. influenzae Rd. genome sequence.Even though these are completely novel sequences, due to theirexpression level during NTHi infection in the chinchilla middle ear, itis likely that expression of these genes are involved in NTHi virulence.

1. An isolated polypeptide comprising the amino acid sequence encoded bya nucleotide sequence of SEQ ID NO:
 663. 2. An isolated polypeptidecomprising the amino acid sequence of SEQ ID NO:
 664. 3. A compositioncomprising a polypeptide of claim 1 or 2 and a pharmaceuticallyacceptable carrier.
 4. A method for eliciting an immune response to NTHibacteria comprising administering an immunogenically effective dose of apolypeptide of claim 1 or 2 to a patient at risk of NTHi bacterialinfection.